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CHAPTER I 
THE PROBLEM 


In recent years much attention has been 
directed to methods of predicting the quality 
of a student’s college work in advance of 
college entrance. Investigations show that 
satisfactory adjustments to college, as meas- 
ured in terms of instructors’ grades, is the 
resultant of many factors, most of which fall 
into two general groups. The first group com- 
prises the individual’s background: his intel- 
lectual endowment, his social and emotional 
adjustments, his habits and methods of work, 
his achievements in high school, his special 
interests and abilities, and his system of 
values,—all those phases of growth and 
development that have affected his personal- 
ity before he became a college student. The 
second group includes those influences which 
are brought to bear on the student after col- 
lege enrollment: the traditions of the campus, 
the new freedom, the program of extra- 
curricular activities, the quality of college 
instruction, the adaptability of the curriculum 
to the student’s individual needs, the avail- 
able guidance service, as well as the equip- 
ment and management of classrooms, libraries, 
laboratories and health clinics. 

When the freshman stands on the threshold 
of his college experience, what equipment 
does he bring with him that will suggest a 
basis for predicting his academic achieve- 
ment? Thus far the most frequent criteria 
used in predicting college success are high 
school scholarship and college aptitude tests: 
in general, those factors which form the col- 
lege student’s background. However, the 
predictions derived from these sources are not 
completely reliable. Not by any means, for 
example, are all the brilliant high school stu- 
dents with high rankings on the college apti- 
tude test highly successful in college. Appar- 
ently academic adjustment in college is 


conditioned by forces and patternings of 
qualities as indefinite, varied, and complex as 
life itself. 

This investigation is concerned with some 
of those elements in the student’s background 
that are related to his variation from a 
standard of academic achievement in college 
predicted for him on the basis of his aptitude 
and his scholastic achievement in high school. 


I. Speciric STATEMENT OF THE PROBLEM 


Statement of the Problem. Specifically, the 
problem of this investigation is to answer the 
following question: 

What are some of the factors underlying 
the unpredicted academic achievement of 
college freshmen? 


II. DELIMITATIONS 


Ordinarily when a_ student’s academic 
adjustment is under consideration reference 
is made to his grade point average as low, 
average, or high as compared with a standard 
that has been adopted arbitrarily for the 
institution. Academic adjustment for the 
purpose of this investigation is defined in 
terms of the comparability of the student’s 
achievement as measured by instructors’ 
grades received during his freshman year, 
with the achievement indicated for him by a 
prediction equation. This equation has been 
derived from the scholarship records for the 
student’s last three years in high school,’ 
percentile rank on the American Council on 
Education Psychological Examination for Col- 
lege Freshmen, and the freshman scholarship 
records of a large sampling of students who 
entered in September and completed one con- 
tinuous year of work in San Diego State Col- 
lege during either 1934-1935 or 1935-1936. 

A delimitation of this investigation is also 
recognized in the use of school marks as in- 


1The majority of freshmen in the population studied 
come from senior high schools. Hence transcripts do not 
include the record of work taken in the ninth grade. 
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struments of measurement. Frasier and Heil- 
man’ in their calculations of correlations 
between the Thorndike Intelligence Examina- 
tion and college achievement found an average 
coeffient of correlation of 0.45, wiih a range 
from 0.24 to 0.57 when instructors’ subjective 
marks were used, and a coefficient of 0.60 
and a range from 0.46 to 0.69 when objective 
achievement examinations were administered. 
The author, in the absence of objective meas- 
urements, makes use of instructors’ marks 
with the point of view as expressed by Toops 
and Kuder, who state: 


Admittedly, grades are a sorry measure 
of the success with which an individual 
meets the college situation. Yet these have 
been used universally as criteria for lack 
of anything better. Grades are a hodge- 
podge of many characteristics of the 
individual, the instructor, the course or 
courses taken, and the situation.® 


This investigation is further delimited to a 
consideration of those background character- 
istics, adjustments, and achievements of stu- 
dents as measured by the scholarship grade 
point average for the last three years of high 
school and for the first year of college;* 
a vocational questionnaire; the American 
Council on Education Psychological Examina- 
tion for College Freshmen, 1931, 1934, and 
1936 editions; the Bell Adjustment In- 
ventory; the Sones—Harry High School 
Achievement Tests, Form A; the Shank Tests 
of Reading Comprehension, Test III, Form C; 
the Progressive Mathematics Test—Ad- 
vanced, Form A, Tests III and IV; and the 
Barrett—Ryan English Test, Form XII. 


The population of the groups upon which 
this investigation was centered was limited 
to students who had been in full time attend- 
ance in the San Diego State College. 


While it was recognized that sex differences 
operate to produce differences in college 


2 George W. Frasier and J. D. Heilman, ‘Experiments in 
Teacher—College Administration, III: Intelligence Tests,” 
Educational Administration and Supervision, XIV (April, 
1928), 276. 


® Herbert A. Toops and G. Frederick Kuder, “Psychologi- 
oa PF - Review of Educational Research, V, No. 3 (June, 
1935), 21 


*While it is recognized that a grade point average obscures 
a student’s success in certain subjects, it is generally used 
in institutions of higher education as the criterion of 
scholastic success. See discussion of its application in David 
Segel, Prediction of Success in College (Office of Education 
Bulletin No. 15; Washington, D. C.: Office of Education, 
1934), pp. 52-56. 
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achievement between men and women,’ no 
segregation of the sexes was made in this 
study. This procedure was necessary because 
the number of subjects for each group would 
have been too small for statistical analysis. 


The experimental group was limited to 
students who had completed their senior high 
school course in the usual three year period, 
who transferred directly to college, and who 
enrolled in twelve or more units of college 
work each semester during the freshman 
year. This delimitation was admitted in 
order that the findings might be based on 
data concerning typical college freshman 
students. 


CHAPTER II 


MATERIALS, SUBJECTS, AND STATISs- 
TICAL PROCEDURE USED IN 
THE INVESTIGATION 


I. INTRODUCTION 


For the purpose of this investigation only 
those selected measures were used that had 
been previously developed by other workers. 
The limitations of such measures are fully 
recognized. Likewise, established statistical 
procedures were employed throughout. In 
this chapter will be given a description of the 
sources and nature of the data and of the 
statistical procedures that were used in this 
investigation. 


II. MATERIAL 


Source and Type of Data. The original 
data were obtained from the office of the 
Registrar of the San Diego State College. 
These consisted of (1) the scholastic records 
of the experimental population in their last 
three years in high school; (2) the scholastic 
records of the experimental groups for their 
first year of college work; a vocational 
questionnaire answered by each subject at 
time of registration; the percentile scores on 
the American Council on Education Psycho- 

®* Wagner reported that boys who deviated negatively from 
predicted college grades exceeded the number of positive 
deviates, whereas the op’ ite was true for the girls. Se 
Mazie Earle Wagner, 1 a udies in Academic Motivation.’ 
Studies in Articulation of High School and College (Univer- 


sity of Buffalo Studies, XIII; Buffalo, N. Y.: University of 
Buffalo, 1936), p. 192. 

*While neither the completion of the senior high schoo! 
course in less than three years nor the taking of an addi- 
tional post graduate year have been proved by investigator 
to influence college achievement, it has been shown that those 
students who go to college after agente some time outsicd« 
of school are remarkably successful in college considering 


their high school achievement. Wagner, of. cit.. p. 222. 
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logical Examination for College Freshmen, 
1930, 1931, and 1934 editions for the pre- 
diction group, and the 1936 edition for the 
experimental group; (3) Scores on the Bell 
Adjustment Inventory—and scores on each of 
four achievement examinations, namely, the 


a. Barrett—Ryan English Test, Form XII. 


b. Progressive Mathematics Tests—Ad- 
vanced Form A, Test 3—-Mathematical 
Reasoning, and Test 4— Mathematical 
Fundamentals. 


c. Shank Tests of Reading Comprehension, 
Test III, Form C. 


d. Sones—Harry High School Achievement 
Test, Form A. 


All of these tests and the vocational ques- 
tionnaire had been administered in accordance 
with standard procedures during the regular 
activities of registration at the beginning of 
the school year in September, 1936. 


Selection of Material. These materials 
were selected both because of their avail- 
ability and their practicability. For several 
years the Committee on Tests and Measure- 
ments of the Faculty of the San Diego State 
College had been endeavoring to select a list 
of standardized tests which could be readily 
administered to all freshmen with a minimum 
of effort, expense, and with a maximum of 
value to administrators and personnel workers. 
From the evaluation of the experience gained 
in administering and using the data of vari- 
ous tests, and from the judgments made con- 
cerning available tests, the committee and 
administration adopted the list as presented 
above. Thus, the validity of the judgments 
made in the original selection and adoption 
of the tests was assumed. And, furthermore, 
it was assumed that the potential practical 
values of the investigation would be enhanced 
if materials already incorporated in the 
administrative and guidance procedures of 
the institution were used. At the same time, 
it was recognized that this assumption would 
necessarily delimit the study to the measures 
already in use, as well as to the validity and 
reliability of those measures. This procedure 
was considered justifiable on the grounds that 
the investigator in the field selected must 
necessarily use those tools which have already 
been developed. 


UNPREDICTED SCHOLASTIC ACHIEVEMENT 





II]. THe SuBjJects 

Experimental Populations. Since one of 
the objectives of the study was to acquire 
information and insights that would be useful 
in directing changes in administrative, per- 
sonnel, and curricular practices in the San 
Diego State College, the experimental popu- 
lations were selected entirely from that insti- 
tution. Two different groups were involved, 
namely, the experimental group, and the 
group used for the purpose of developing the 
prediction formula which was to be applied 
to the experimental group. 


Experimental Group. The experimental 
group consisted of al! beginning freshmen 
who enrolled in the college for twelve or more 
units of work in September, 1936, and who 
had completed a year’s work by June, 1937, 
with a minimum of twelve units each 
semester. The group was further limited to 
those students who had completed their 
regular senior high school course in three 
years, had not taken post graduate work in 
high school, and who transferred directly from 
high school to college. The group selected in 
this way had a membership of three hundred 
and eighty-two. 

Prediction Group. The group used for the 
development of the prediction formula con- 
sisted of a random sample of six hundred 
members of the freshman classes of September, 
1934 and September, 1935. The criteria ap- 
plied in the selection of members of the experi- 
mental group were also used for this group. 
However, instead of taking the whole of one 
year’s freshman class alone or the total of 
both years’ classes, the first three hundred 
names on an alphabetically arranged list for 
each class were selected. By this method a 
more representative sampling of subjects for 
use in prediction in the institution was pro- 
vided than if the group had been limited to 
a single year’s class. Three hundred mem- 
bers of each of the two classes were used 
because that was the maximum number that 
could be secured for the class. of 1934. 


IV. STATISTICAL PROCEDURES 
A. Scores Used 


High School and College Scholarship Aver- 
ages. Most of the high school transcripts of 
record employed the same five-point marking 
system as that used in the college, namely, 
A, B, C, D, and F. In every case transcripts 
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which carried other types of marking symbols 
supplied a transmutation table which was 
applied in order to express grades in symbols 
on the letter scale. For statistical use these 
marks were assigned the numerical weightings 
of 4, 3, 2, 1, and o, respectively. Numerical 
weightings for college marks were 3, 2, I, 0, 
and -1 for grades A, B, C, D, and F, respec- 
tively.’ After the assignment of weightings 
the grade point ratios or averages for scholar- 
ship in both high school and college were 
figured for each subject by dividing the total 
grade points earned by the number of units 
attempted. All weightings and calculations of 
grade point averages were made by the 
regularly employed personnel of the Regis- 
trar’s office, supplemented by trained assist- 
ants. All work was checked for accuracy by 
a trained statistician. 

Psychological Examination Scores. Per- 
centile ranks were used instead of raw scores 
on the five separate tests of the American 
Council on Education Psychological Examin- 
ation for College Freshmen. This procedure 
was followed because it made possible the 
inclusion of scores on three different editions 
of the examination in one distribution which 
was necessary when developing the prediction 
formula. Since this prediction formula was 
based in part upon percentile ranks it is 
obvious that the scores on the 1936 edition 
with which the formula was applietl must 
likewise be expressed in percentiles. 

The Bell Adjustment Inventory Scores. 
Raw scores on each of the four parts of the 
test were recorded, namely, Home Adjust- 
ment, Health Adjustment, Social Adjustment, 
and Emotional Adjustment. 

Achievement Test Scores. The raw scores 
on the Barrett-Ryan English Test, and on 
the Shank Tests of Reading Comprehension 
were used. Gross scores were recorded for 
Test 3—Mathematical Reasoning and Test 4 
—Mathematical Fundamentals, of the Pro- 
gressive Mathematics Tests—Advanced Form 
A, and for the total and the four sections of 
the Sones—Harry High School Achievement 
Test, namely, Language and Literature, 
Mathematics, Natural Science, and Social 
Science. 


1 Different weightings were used for the high school grades 
to avoid the extra effort involved in — negative numbers 
in calculation. Additional time was saved by applying the 
weighting involving a use of the negative number in college 
grades because these had been determined by the Registrar’s 
office in accordance with customary practice. This procedure 
was justified on the ground that a different system of weight- 
ings had no effect upon the accuracy of the statistical 
treatment. 
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B. Tabulation of the Data 


For Computing Regression Equation (the 
Prediction Formula). Data from the Regis. 
trar’s records were entered on three by five 
cards for each of the six hundred cases. After 
proper sortings, entries were made in the 
appropriate cells on correlation charts. 

For Original Data Concerning the Experi- 
mental Group. All data secured from the 
Registrar’s records, together with the pre- 
dicted grade point averages, as calculated by 
the use of the regression equation for each 
subject, and the group classification of each 
subject derived as described below? were 
entered on master original data sheets, 
samples of which will be found in the Appen- 
dix of this treatise. 


C. Formulas 


All formulas were taken from standard 
textbooks in educational statistics.* 


D. Combination of Criteria in 
Regression Equation 


Regression equations have been used by 
various investigators in an effort to improve 
predictive efficiency over that possible by the 
use of a single factor. They generally agree 
that more reliable predictive indices can be 
derived from a combination of two or more 
factors than from one of them alone. 

Douglass* found, after calculating eighteen 
different multiple correlation coefficients, that 
the highest coefficients from two variables are 
obtained from the use of high school marks 
and percentile rank on the American Council 
on Education Psychological Examination for 
College Freshmen. He reported a multiple 
correlation of .626, and also that a third 
variable adds little to the predictive merit of 
a combination of two variables. In his sur- 
vey of ten reliable studies which reported the 
use of various combinations of two or more 
prognostic variables, Douglass® found general 
agreement with his own conclusions. They 
showed a median mutiple correlation of .61, 
while the median correlation for high school 
marks and intelligence was .58. 


2See Chapter III, /nfra. 

*Henry E. Garrett, Statistics in Psychology and Educa 
tion (New York: Longmans, Green and Company, 192( 
and Karl J. Holzinger, Statistical Methods for Students 
Education (Chicago: Ginn and Company, 1928). 

*Harl R. Douglass, The Relation of High School Prepare 
tion and Certain Other Factors to Academic Success at the 
University of Oregon (University of Orgeon Publication, I!! 
No. 1; Eugene, Oregon: University of Oregon, September 
1931), p. 49. 

5 Jbid., p. 50. 
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Wagner® agrees with Douglass that a com- 
bination of the better measures is usually 
found to be more predictive than the best 
single one. She confirms her statement by 
showing a median multiple correlation coeffi- 
cient of .67 for twenty-four authors which 
she lists. The median multiple correlation 
coefficient obtained for studies using high 
school marks and intelligence test scores’ 
was .57. 

Symonds, in summarizing four independent 
studies by May, Wood, Johnston, and 
Symonds, states: 


In each of these four studies both the 
intelligence test and high school marks 
supplement each other so that taken in 
combination they predict college success 
better than either one singly.® 


The author concludes: 


That no one factor predicts college suc- 
cess adequately and that any criterion of 
college success should be the composite of 
several objectively measurable factors.® 


Odell,*° after indicating an expected range 
in correlation from 0.40 to 0.50 between 
intelligence test score and college scholarship 
states that a combination of test score and 
high school marks may be expected to yield 
correlations of about o.60 or higher with 
college grades. 


Brammel,"* in his survey of prediction 
studies also points out that many investi- 
gators have obtained better results by 
employing a combination of criteria than by 
confining prediction to one only. 


A. B. Crawford’? reported correlations from 
0.68 to 0.74 on a combination of transmuted 
high school scholarship College Entrance 
Examination Board averages, scholastic apti- 
tude test scores, and age at entrance. 


*Mazie Earle Wagner, “A Survey of the Literature on 
College Performance Prediction,’’ Studies in Articulation of 
High School os College (University of Buffalo Studies, IX; 
Buffalo, N. : University of Buffalo, 1934), p. 198. 

' Ibid., p. 8 


* Percival M. Symonds, Measurement in Secondary Educ 
tion (New York: The Macmillan Company, 


* Ibid., p. 425. 


* Charles W. Odell, Predictin . E Setpeithe Success of 
College Freshmen (Bureau of esearch, Bulletin 
ee Urbana: University of Tilinois, ag fo 13, 1927), 
p. 18, 


"P. Roy Brammel, “Articulation of High School and 
College,” The Reorganization of Secondary Education (Office 
of Education Bulletin, No. 17, 1932, National Survey of 
Education Monograph No. 10. Washington, C.: Office of 
Education, 1933), p. 25. 

2A. B. Crawford, beret se 


Freshmen Achievement,” 
School and Society, XXXI 


(aneny 25, 1930), 125-132. 


UNPREDICTED SCHOLASTIC 


1928), p. 423. 
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In view of the findings of Douglass and 
other investigators and because of the 
availability of data, the combined criteria of 
high school marks and percentile scores on 
the American Council on Education Psycho- 
logical Examination for College Freshmen 
were used in the development of the regres- 
sion equation for predicting college achieve- 
ment for the purpose of this investigation. 


E. Method of Classifying Subjects 
Into Groups 


This investigation was concerned with dis- 
covering the characteristics of the subjects, 
classified as students of good promise or stu- 
dents of poor promise, whose work was 
graded either better or poorer by the instruc- 
tors than was predicted in each case by the 
personnel workers.’* It therefore became 
necessary to define precisely the meaning of 
the terms “good promise,” “poor promise,” 
“better than predicted,” “and poorer than 
predicted.”” These terms were defined in a 
way that produced a number of cases in 
extreme groups that could be studied statis- 
tically. 

Wagner,** in a study of inconsistent or 
unpredicted performance in college, divided 
and classified her population into five groups. 
She did this by plotting a graph of high school 
and college grades upon which she drew a 
trend line to represent the average college 
grade made by students of a particular high 
school average. All students whose college 
marks were either reliably above or below the 
average mark made by those representing a 
certain average on the New York Regents’ 
Examination (the measure of high school 
achievement) were selected for her study. For 
the purpose of reliability she chose only 
those cases whose actual college grade varied 
from the predicted score more than the prob- 
able error of the estimate. Cases were 
classified into the five groups in terms of 
probable error deviations, as follows: 


1. Those who obtained college marks very 
much better than would have been pre- 
dicted—at least two probable errors; 

2. Those who obtained college marks 
somewhat better than would have been 
predicted—one to two probable errors; 


13 The regression equation, as derived from data concerning 
high school marks and percentile scores and presented in 
Chapter III, provided the means for obtaining the predicted 

college grade point average of each subject. 


“ Wagner, “Studies in Academic Motivation,” Studies in 
Articulation of High School and College, pp. 188-192. 
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3. Those who were non-deviates, or who 
obtained college marks within 1 P. E. 
of those that would have been predicted 
for them; 

4. Those who obtained college marks some- 
what poorer than would have been pre- 
dicted—one to two probable errors; 

5. Those who obtained college marks very 
much poorer than would have been pre- 
dicted—at least two probable errors. 


She called groups 1 and 2 the positive 
deviates and groups 4 and 5 the negative 
deviates. 

The method of classifying subjects into 
groups in this investigation differed from 
Wagner’s in several respects, namely, (a) in 
the application of combined criteria for pre- 
diction, instead of a single criterion, (b) in 
setting the limits of groups in terms of 
standard deviation instead of probable error, 
and (c) in grouping subjects not only on the 
basis of obtained college marks that were 
better or poorer than predicted, but also on 
the basis of the quality of the predicted col- 
lege marks. The difference mentioned first 
was due to an effort to improve prediction; 
the second has no real significance, being 
merely a matter of choice between statistical 
formulae; and the third difference was 
justified on the basis that the segregation of 
the deviates into subgroups might serve to 
disclose more facts than could come to light 
when all students of all degrees of promise 
were grouped together. Consequently, a 
classification procedure was devised which 
would arrange subjects in groups large enough 
for statistical calculation, both in terms of a 
scholarship quality scale and of deviation from 
prediction. 

The procedure involved the following 
steps: 


1. A graph was plotted of the predicted 
college grade point averages and the differ- 
ences between the predicted college grade 
point averages and the obtained college grade 
point averages. 

2. This graph was divided into nine sec- 
tions, each containing approximately the 
same number of cases. The division was made 
by setting the limits of the middle group on 
the X axis at plus one-half sigma and minus 
one-half sigma and the middle group on the 
Y axis likewise at plus one-half sigma and 
minus one-half sigma. The upper and lower 
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groups were limited by plus one-half sigma 
at the top of the range and by minus one- 
half sigma at the bottom of the range 
respectively. 

3. Each of the nine groups of subjects was 
assigned a code number and described as 
follows:*® 


1A—Good positive deviates. Those who 
were predicted to do good college 
work but did better. 

1B—Good non-deviates. Those who were 
predicted to do good work and 
obtained the record predicted. 

1C—-Good negative deviates. Those who 
were predicted to do good work but 
did poorer. 

2A—Average positive deviates. Those who 
were predicted to do average work 
but did better. 

2B—Average non-deviates. Those who 
were predicted to do average work 
and obtained records predicted. 

2C—Average negative deviates. Those who 
were predicted to do average work 
and did poorer. 

3A—Poor positive deviates. Those who 
were predicted to do poor work but 
did better. 

3B—Poor non-deviates. Those who were 
predicted to do poor work and 
obtained record predicted. 

3C—Poor negative deviates. Those who 
were predicted to do poor work and 
did poorer. 


It will be noted that the term “average 
work” was defined arbitrarily as that repre- 
sented by a grade point average that fell 
between plus one-half sigma and minus one- 
half sigma of the distribution of predicted 
college grade point averages, and that “good 
work” and “poor work” represented grade 
point ratios that fell above plus one-half 
sigma, and below minus one-half sigma, 
respectively. It will be noted, also, that the 
term “obtaining the record predicted” was 
defined arbitrarily as the achievement of a 
grade point average whose difference from the 
predicted average fell within plus one-half 
sigma and minus one-half sigma of a distri- 
bution of the difference between these meas- 
ures for the total population. Students classi- 
fied as “better than predicted” or “poorer 
than predicted” were therefore described as 


% The code designations and limits of the groups are 
shown diagrammatically in Chart 1, page 20. 
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CHART / 


LIMITS OF THE SUB- GROUPS OF THE 
ToTaL EXPERIMENTAL POPULATION 
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having achieved college averages whose 
differences from those predicted fell above 
one-half sigma or below one-half sigma, res- 
pectively, of the distribution of the differences 
between the predicted and the obtained col- 
lege grade point averages for the whole 
population. 

It will be further noted that in the group 
code numbers the numerals 1, 2, and 3, 
signify the quality of the predicted college 
marks as good, average, and poor, respec- 
tively, while the letters A, B, and C indicate 
variation from prediction, as, better than pre- 
dicted (positive deviates), as predicted (non- 
deviates), and poorer than predicted (nega- 
tive deviates), respectively. 

It will be readily seen that each code num- 
ber carries a key to the description of a cer- 
tain group. For example, 1A represents the 
group of students predicted to do good work 
whose performance exceeded that predicted. 
The code number 1C signifies a group pre- 
dicted to do good work but who fell short 
of the estimate. 

The setting of the lower limits of the A 
group at plus one-half sigma, and the upper 
limit of the C group at minus one-half sigma 
of the distribution of the differences between 
the obtained and predicted averages, was 
justified by the fact that this investigation 
was concerned primarily with the extreme 
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groups between which there was relatively 
little probability of overlapping. Since the 
two groups were separated by one sigma of 
this distribution, and since one sigma was 
approximately equal to 1.41 times the prob- 
able error of the estimate’® the chances were 
only about one in three that any true score 
in either group would fall in the other. The 
setting of the lower and upper limits of groups 
1 and 3, respectively, was justified in like 
manner with the chances of one to one"? that 
any true score in either group would fall in 
the other.’* 


F. Method Used to Compare Sub-Groups on 
the Measures Applied 


Throughout this treatise the groups are 
presented in the paired combinations as 
follows: 


Group 
1A—predicted good posi- 
tive deviates with 
1A—predicted good posi- 
tive deviates with 
1A—predicted good posi- 
tive deviates with 
1C—predicted good neg- 
ative deviates with 
1C—predicted good neg- 
ative deviates with 
3A—predicted poor posi- 
tive deviates with 
A—predicted total posi- 
tive deviates with 


Group 
1C—predicted good neg- 
ative deviates. 
3A—predicted poor posi- 
tive deviates. 
3C—predicted poor neg- 
ative deviates 
3A4—predicted poor posi- 
tive deviates. 
3C—predicted poor neg- 
ative deviates. 
3C—predicted poor neg- 
ative deviates 
C—predicted total neg- 
ative deviates. 


It will be observed that these paired group- 
ings represent every possible combination of 
the extremely deviating sub-groups of the 
total experimental population and that the 
middle groups have been omitted, except when 
presented in combination with the total 
grouping of positive or negative deviates. 

The reliability of the differences between 
the various sub-groups on the various meas- 
ures was determined by use of the usual 
formula applied when computing the standard 
error of the differences between their means. 
In order to expedite the making of compari- 
sons between groups the reliability of the dif- 
ferences betwen the means for each measure 
was expressed in terms of the ratio of the 
difference to its standard error. Thus the 

% The sigma of the distribution was 45. The probable 


error of the estimate was .321. 

7 The sigma of the distribution was .45. The probable 
error of the estimate was .321 

%* As will be noted in Chapter III, this procedure resulted 
in the classification of subjects into groups which showed 
completely reliable statistical differences on the measures used 
in their organization. 
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probability that the differences were greater 
than zero could be directly ascertained by 
inspection of the ratio. According to general 
practice a ratio of three is accepted as indica- 
tive of complete reliability.** Therefore, any 
value of less than three obviously falls short 
of complete statistical reliability, and any 
value greater than three indicates added 
reliability. Furthermore, once a critical ratio 
is obtained whose magnitude is less than three, 
the chances in one hundred that the difference 
between the means of the measures is greater 
than zero can be readily read from a 
statistical table.*° 


CHAPTER III 
STATISTICAL PREPARATION OF DATA 


In this chapter will be presented descrip- 
tions of the populations of the various groups 
of subjects involved in this investigation, 
namely, the prediction group, the total experi- 
mental group, and the sub-groups of the ex- 
perimental group. This chapter will give 
consideration to the statistical preparation of 
the data preliminary to the analysis of the 
characteristic of sub-groups in terms of the 
measures used in this study. 


Description of the Prediction Population. 
The population consisted of six hundred 
freshmen who had completed one year of col- 
lege work selected as described in Chapter IT. 
They represented a reasonably valid sampling 
as indicated by the distribution of the three 
measures applied, namely, college grade point 
average, high school grade point average, and 
American Council percentile scores, as shown 
in Table I. 


Coefficients of Correlation. Scatter dia- 
grams were prepared with the ranges divided 
into an appropriate number of intervals in 
each case as follows: high school grade point 
averages, fourteen; college grade point aver- 
ages, eighteen; and American Council per- 


Garrett, op. cit., p. 133. 


* For the table used in this investigation see Garrett, 
op. cit., p. 134. 
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centile scores, twenty. Following the usual 
procedure the six hundred individual data 
cards were sorted into piles, representing the 
various cells, and counted. The number in 
each cell was entered in the proper cell of the 
scatter diagram. Zero order coefficients of 
correlation were obtained in the usual manner 
by use of the Pearson product-moment 
formula. The probable errors of the coeffi- 
cients were taken from a statistical table.’ 
The coefficients of correlation and their prob- 
able errors, thus determined, are given in 
Table II. By referring to this table it may be 
seen that all the coefficients of correlation 
obtained were approximately equal to the 
medians for similar measures as reported in 
other investigations.” 

The multiple correlation coefficient for the 
two prognostic variables, namely high school 
averages and American Council percentile 
scores, was .561. Since the coefficient of cor- 
relation between college averages and the 
single variable of high school averages was 
.524, it would appear that the addition of a 
second variable in the computations adds 
little to the accuracy of prediction. In view 
of the slight increase in predictive accuracy 
of the combined criteria over the single 
criterion, and because of the large amount of 
labor required for computing the multiple 
correlation coefficient, it would seem that for 
general practical purposes, as distinguished 
from research, the use of the single prediction 
criterion of high school averages should 
suffice. 

Values Derived for the Regression Equa- 
tion. The formulae and procedure described 
by Holzinger* were employed with the data 
contained in Table I and Table II, to compute 
the values for the regression equation 


X=), 9.yX2+D,3-2X%.+C 
where X, signifies the predicted college grade 
point average; X, that high school grade point 


1 Garrett, Statistics in Psychology and Education, p. 171 
2See review of the literature in Chapter II, Section D 


supra. 
3 Holzinger, Statistical Methods for Students in Education 
p. 293. 


TABLE I 
MEANS, STANDARD DEVIATIONS, AND RANGES OF MEASURES OF THE PREDICTION POPULATION 
(600 CasEs) 

Standard 

Measure Range Mean deviation 
1. College grade point average_.._...____.____- _._. —0.33 to 3.00 1.204 0. 574 
2. High school grade point average.....____._____- 1. 30 to 3. 96 2.745 0. 549 
8. American Council percentile scores___________ -_- 00to 99 54. 92 25. 880 
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TABLE II 


Zero ORDER COEFFICIENTS OF CORRELATION 
BETWEEN MEASURES OF THE PREDICTION 


POPULATION 
3. American 1. College 
council grade point 
percentile average 
scores 
1. College grade 
point average.. .409 «+. 023 
2. High school 
grade point 
average______- .437 =. 022 . 524 *. 020 


average, X, the American Council percentile 
score, and C a constant. With the computa- 
tions thus made the regression equation 


reads 
X,—=.446X,+.005X,-+-.294. 


The standard error of the estimate rendered 
was .476 and the probable error of the 
estimate was .321. 

Description of the Total Experimental 
Population. Following the procedure des- 
cribed in Chapter II, Section III, 382 sub- 
jects were selected—1go men and 192 women. 
While it was recognized that sex differences 
operate to produce differencs in the scholastic 
achievement between men and women, no 
segregation of the sexes was made in this 
study. Wagner reported that boys who devi- 
ated negatively from predicted college grades 
exceeded the number of positive deviates, 
whereas the opposite was true for the girls.* 
On the basis of her finding, the inclusion of 
the men and women together in the various 
groupings of deviates constitutes a delimita- 
tion of this study. However, this was neces- 
sary because the number of subjects for each 
group would have been too small for statistical 
analysis. 

The calculation of the means and standard 
deviations for the 382 cases for the distribu- 
tion of college averages, the high school aver- 
ages, and the American Council percentile 
scores gave the results as presented in 
Table ITI. 


* Wagner, “Studies in Academic Motivation,’ Studies in 
Articulation of High School and College, p. 192. 
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By applying the regression equation the 
value of the predicted grade point average 
for each of the 382 cases was computed and 
entered upon the master data sheet.’ Follow- 
ing this, the difference between each subject’s 
predicted average and obtained average was 
computed and entered on the master data 
sheet. 


Description of the Sub-Groups of the 
Experimental Population. Following the pro- 
cedure as described in Chapter II, Section E 
above, the total experimental population was 
divided into nine groups. The means, stan- 
dard deviations, and the ranges of the sub- 
groups on American Council percentiles, high 
school averages, obtained college averages, 
predicted college averages, and the differences 
between obtained and predicted college aver- 
ags are presented in Tables IV, V, VI, VII, 
and VIII, respectively. The standard error of 
the differences between the means of paired 
groups on predicted college averages are 
shown in Table IX, and Chart 2. Table X and 
Chart 3 present the standard error of the dif- 
ferences between the means of the paired 
groups on the differences between the pre- 
dicted college averages and the obtained col- 
lege averages. Since the major concern of 
this study is with the deviates, consideration 
will be given only to groups 1A, 1C, 3A, 3C, 
A and C. By the elimination of the non- 
deviating groups any differences in the char- 
acteristics of the extreme groups should be 
brought out more clearly. This procedure was 
tested by calculating the statistical differences 
between the means of the distributions of pre- 
dicted college averages of the groups in every 
paired combination possible. By referring to 
Table IX and Chart 2, it will be noted that 
the standard error of the difference between 
the means of groups 1A and 3A, between 1A 
and 3C, between 1C and 3C, and between 
1C and 3A in each case less’ than 
one-fifth of the respective difference be- 
tween the means. Since the 1A and the 1C 
groups are, by definition, those groups com- 


5A sample page of the master data sheet is shown in the 
Appendix 


TABLE III 
MEANS, STANDARD DEVIATIONS, AND RANGES OF THE EXPERIMENTAL GROUP (382 CASES) 


Measure 


1. College grade point average._..________. 
2. High aces 


ool grade point average... 
3. American Council percentile score _......._____. 


Standard 

Range Mean deviation 
_.. —0.37 to 2. 83 1.26 0. 54 
_. 1.86 to 4.00 2.74 0. 55 


Olto 98 46.65 26. 89 
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TABLE IV 


(Vol. 7, No. ; 


MEANS, STANDARD DEVIATIONS, AND RANGES OF THE SUB-GROUPS AND GROUPINGS OF THE 
EXPERIMENTAL POPULATION ON THE AMERICAN COUNCIL ON EDUCATION 
PSYCHOLOGICAL EXAMINATION (PERCENTILE SCORES) 


Group or Grouping Number Range 
| 2 eee eee pekaewe 35 34—97 
1B. a a laa hd : : 41 14—98 
Oat ars 31 23—96 
2A__ . 34 02—92 
a 66 06—90 
aa 50 11—92 
ae... «« 42 02—53 
op....- 47 01—84 
_ ee eee 36 01—81 
A—aAIll positive deviates_____- 111 02—97 
B—All non-deviates_________-_ : 154 01—98 
C—All negative deviates______ - : 117 01—96 
Total population.___________- - oeccn « 01—98 

TABLE V 


Mean 


Standard 


deviation 


MEANS, STANDARD DEVIATIONS, AND RANGES OF THE SUB-GROUPS AND GROUPINGS OF THE 
EXPERIMENTAL POPULATION ON HIGH SCHOOL GRADE POINT AVERAGES 


Group or Grouping Number Range 
BS ie eb acaba addin aia ‘ 5 35 2. 71—4. 00 
EE Ey ae i 41 2. 81—3. 86 
a ES . 31 2.74—3. 81 
2 ee Ae 34 2.21—3. 29 
ee a ada . 66 2. 29—3. 21 
2C ; 50 2. 11—3. 33 
Sea eee ‘ 42 1. 48—2. 83 
oe, Capel deelinadiaaes, 47 1. 36—2. 70 
nea aa tall ST ‘ 36 1.41—2. 75 
Se ee oe ee eg 111 1. 483—4. 00 
eS ee ee anes 154 1. 36—3. 86 
SE Oy a eee nr 1. 41—3. 81 
ne eeu wee 382 1. 36—4. 00 

TABLE VI 


DNNNNNNNNN Ws 


Standard 
deviation 
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ee 
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MEANS, STANDARD DEVIATIONS, AND RANGES OF THE SUB-GROUPS AND GROUPINGS OF THE 


EXPERIMENTAL POPULATION ON OBTAINED COLLEGE AVERAGES 


Group or Grouping Number 
Ot. Sou kaudeieaus Geeks céedetensiedé 35 
1B_. Rat fae. Bited chk. ce cee Aneecy 41 
31 
Ee ISS ee aan ee 34 
a ai Scr Kas a ee 66 
EE SS ey ee a ee 50 
_. Sareea Z 42 
A AS Se ae ee ee 47 
ae ae 36 
ON! jd | o92 IS ae 111 
ie ee See >: Free 154 
I EEE SE HS ee 117 
‘Toons pepemmene... . ...... .. nee 382 


Range 
74—2. 
24—2. 
39—1. 
43—2. 
96—1. 
15—1. 
91—2. 
51—1. 
37—0. 
91—2. 
51—2. 
.387—1. 
—0. 37—2. 


beebeses-syy 


83 


Standard 
deviation 


27 
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TABLE VII 
E 
MEANS, STANDARD DEVIATIONS, AND RANGES OF THE SUB-GROUPS AND GROUPINGS OF THE 
EXPERIMENTAL POPULATION ON PREDICTED COLLEGE AVERAGES 
“d Standard 
n Group or Grouping Number Range Mean deviation 
SD tien ane ee Nek elke tee ae ale 35 1.34—1. 95 1. 64 0.18 
i MSS AR aes ae 2 Pe 41 1. 33—1. 86 1. 54 0.14 
; 1C.. ‘ 31 1. 33—1. 80 1. 53 0.14 
oA : > a 1. 02—1. 31 1.17 0. 07 
) oB_- 66 1.01—1. 31 1.16 0.10 
BE AES ae oe Z 50 1.01—1. 31 1.17 0.09 
) ~ Se eer er 0.41—1. 00 0.7 0.17 
) a . 0. 36—1. 00 0. 81 0.15 
3C- _ 86 0. 63—1. 00 0. 87 0.11 
; A me 111 0.41—1. 95 1.17 0. 39 
3 Be. Sear tees oe oak be ‘ 154 0. 36—1. 86 1.16 0. 30 
: SO ; 117 0. 68—1. 80 1.17 0.27 
) Total population---__- Sask is - 3882 0. 36—1. 95 1.17 0.32 
TABLE VIII 
- MEANS, STANDARD DEVIATIONS, AND RANGES OF THE SUB-GROUPS AND GROUPINGS OF THE 
EXPERIMENTAL POPULATION ON THE DIFFERENCES BETWEEN THE OBTAINED AND THE 
rd PREDICTED COLLEGE AVERAGES 
” Standard 
Group or Grouping Number Range Mean deviation 
er actesieaniinn ea tb taba tied ches ob meee 35 0.34— 1.43 0. 62 0.24 
RE A ee ee ee? yt 41 —0.11— 0.33 0.12 14 
ROR S 2! 5. Se ee. oer eee sere 31 —0.96——0.13 —0.45 0.20 
, RR St Ce ee Peep: Ne 34 0.34— 1.21 0. 58 0.23 
RS te cr ne a te ee 66 —0.10— 0.33 0.13 0.13 
Oe cane RIO PERT PHY _. 60 —1.06——0.13 —0.39 0.25 
a ee 42 0.37— 1.33 0. 66 0. 25 
aka cs Ashcan ss ashes itd ecm neebiai 7 —0.09— 0.33 0.13 0.14 
Alita kana chekdcecnoustaecicbowetes 36 —1.28——0.13 —0.38 0.24 
REST a Lenk « ee tie mT TIRE ete 111 0.34— 1.43 0. 62 0.24 
_ Fae A ene een _ 154 —0.11— 0.33 0.13 0.13 
a  seaioeen 117 —1.28—-0.13 —0.45 0.24 
it acvvaicenstcnmanmaniie 382 —1.28— 1.43 0.11 0.44 
iE TABLE IX 
rd DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON PREDICTED COLLEGE AVERAGES 
) 
r Groups Compared Mean Mean Difference S. E. »* Chances 
of of between diff. S. E. in 
Group 1 Group 2 Group 1 Group 2 means diff. 100 
1A 1C 1. 64 1.53 0.11 0.04 2.79 99.7 
1A 3A 1.64 0.78 0. 86 0.04 21.41 100 
1A 3C 1. 64 0. 87 0.77 0.04 21.68 100 
1C 3A 1. 53 0.78 0.75 0.04 20.64 100 
1C 3C 1. 53 0.87 0. 66 0. 03 21.21 100 
3A 3C 0.78 0. 87 0.09 0.03 —2.81 99.7 
A Cc 1.17 1.17 0. 00 0. 04 0.00 0 
* The first pe the trait to a greater degree than the second group except where indicated 
by a minus sign p ing the ratio of the difference to its standard error, when the reverse is true. 
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TABLE X 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE DIFFERENCES BETWEEN 
PREDICTED COLLEGE AVERAGES AND OBTAINED COLLEGE AVERAGES 


Groups Compared — Mean 
oO of 

Group 1 Group 2 Group 1 Group 2 
1A 1C 0. 62 —0. 45 
1A 3A 0. 62 0. 66 
1A 3C 0. 62 —0. 38 
1C 3A —0. 45 0. 66 
1C 3C —0. 45 —0. 38 
3A 3C 0. 66 —0. 38 
A Cc 0. 62 —0. 45 





Difference S. E. » * Chances 
between diff. S.E. in 
means diff. 100 

1.07 0.05 19. 75 100 
0.04 0. 06 —0.71 76 
1.00 0.06 17. 55 100 
1.11 0.05 —21.06 100 
0. 07 0.05 — 1.30 90 
1.04 0.06 18. 72 100 
1.07 0.03 33. 65 100 


* The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART 2 


Retiasitiry oF DirFeRENCES BETWEEN 
THE MEANS OF PAIRED GROUPS FOR 
DeeoicTlo COLLEGE AVERAGES 
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prising the students whose college work was 
predicted to be good, and since the data as 
presented in the table and graph indicate the 
means of both 1A and 1C to be greater than 
either 3A or 3C, it is obvious that the pro- 
cedure followed produced groups, which with 
statistical certainty possessed differences 
greater than zero, in the measure used, namely 
predicted college grade point average. Like- 
wise, reference to Table X, and Chart 3, will 


CHART 3 
RELIABILITY OF DIFFERENCES BETWEEN 


THE MEANS OF FAIRED GROUPS FOR 
PREDICTED AND OBTAINED COLLEGE AVERAGES 


TOTAL 
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also show the statistical certainty of a differ- 
ence greater than zero between the means of 
groups 1A and 1C, 1A and 3C, 3A and 3C, 
and 3A and 1C on distributions of their re- 
spective differences between predicted and 
obtained college grade point averages. In 
this case, the means of either of the 1A and 
3A groups are greater than either of the 
means of the 3C and 1C groups. This indi- 
cates that the procedure of selecting extreme 
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groups produced groups significantly different 
in the amount and direction of the differences 
between obtained college averages and pre- 
dicted college averages. 


CHAPTER IV 


STATISTICAL ANALYSIS AND 
FINDINGS 


The characteristics of the various sub- 
groups of the experimental population, as 
indicated by the measures applied and by the 
computation of the reliability of the differ- 
ences of paired groups, will be presented in 
this chapter under two major headings, 
namely, academic measures and non-academic 
measures. 


I. ACADEMIC MEASURES 


Mechanics of English Usage—Barrett— 
Ryan English Test. 1. The data as presented 
in Table XI and Chart 4 show no completely 
reliable differences between the total positive 
deviates (A) and the total negative deviates 
(C) in their use of the mechanics of English, 
although there are about sixty-nine chances 
in a hundred that the positive deviates excel 
the negative deviates on this measure. 

Wagner’ in her study of variations from 
predicted college achievement reported a 
superiority of the total positive deviates over 
the total negative deviates somewhat greater 
than that found in this investigation. How- 
ever, both findings are alike in that no 
statistically reliable differences are indicated. 

2. The good positive deviates (1A) demon- 
strate a reliable superiority over all groups 
except the good negative deviates (1C). 
However, an approximately reliable difference 


1 Mazie Earle Wagner, ‘Studies in Academic Motivation,” 
Studies in Articulation of High School and College (Univer- 
sity of Buffalo Studies, XIII; Buffalo, N. Y.: University of 
Buffalo, 1936), p. 198. 
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CHART 4 


RewiasiliTyY OF DIFFERENCES BETWEEN 
THE MEANS OF PAIRED GROUPS FOR 
THE BARRETT- RYAN ENGLISH TEST 
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is shown (98 chances in 100) in favor of the 
good positive deviates over the good negative 
deviates. 

3. The poor negative deviates (3C) tend 
to excel the poor positive deviates (3A), but 
without statistical reliability (93 chances in 
100). 

4. The students who were predicted to do 
work below the average and who exceeded 
expectations (3A) show the poorest rating of 
all groups in their knowledge of the mechanics 
of English. The mean for this group was 81.45 


TABLE XI 
DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE BARRETT-RYAN ENGLISH TEST 


Groups Compared Mean Mean 
of of 


Group 1 Group 2 


112.75 105. 07 
112.75 81.45 
112. 75 87. 30 
105. 07 81.45 
105. 07 87.30 
81.45 87.30 
95.79 94.44 


Group 1 Group 2 


Difference 


S. E. D* 
between diff. S.E 
means 


7.68 
31. 30 
25. 45 
23. 62 
17.77 

5. 85 

1.35 


Chances 
a in 
diff. 100 
2.08 98 
8.29 100 
6. 50 100 
6. 67 100 
4. 82 100 
—1. 55 93 
0.51 69 


3. 69 
3.78 
3.92 
3. 54 
3. 69 
3.77 
2. 63 


* The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 
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as compared with 87.30, 105.07, and 112.75, 
respectively, for the deviating groups classi- 
fied as poor negatives (3C), poor positives 
(1C), and good positives (1A). 

These findings indicate a superiority in 
English usage, as measured by the Barrett— 
Ryan test, of students of good promise (1A 
and 1C) over students of poor promise (3A 
and 3C) at the time of college entrance. They 
also show that students of good promise who 
improve the quality of their academic per- 
formance (1A) demonstrate better mastery 
of English usage than similar students whose 
achievements fall below the level predicted 
(1C). Thus, for students of good promise, 
English usage seems to be a quality associated 
with, or existing concurrently with, the 
improvement of academic record. However, 
for the negative deviates (3A and 3C) this 
situation does not hold true. There is thus 
indicated a need for further investigation of 
the causes, other than the differences in the 
mastery of the mechanics of English, that 
tend to produce better scholastic records than 


those predicted, especially in the case of stu-. 


dents of poor promise. 


Reading Comprehension—The Shank 
Tests of Reading Comprehension. Reading is 
generally considered the most important tool 
in the learning process, especially on the col- 
lege level, because students must depend very 
largely upon their ability to acquire facts and 
ideas from the printed page in order to achieve 
the objectives set for them. Shuttleworth’ 
reported a zero order coefficient of correlation 
of only 0.462 between reading comprehension 
and freshman scholarship grade point average. 
The present investigation demonstrates a 


? Frank K. Shuttleworth, “Environmental and Character 
Factors Involved in Scholastic Success,’ Journal of Educa- 
tional Psychology, XX (September, 1929), 427. 
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similar relationship, for, as will be noted by 
referring to Table XII, and Chart 5, both the 
positively and the negatively deviating groups 
of students classified as above average (1\ 
and 1C) showed means of 78.69 and 75.20. 
respectively, results which were reliabl) 
higher than the respective means of the posi- 
tively and negatively deviating groups of 
below average students (3A and 3C), which 
were 57.28 and 65.20, respectively. The in- 
ference is that students of good promise 


CHART 5S 


RELIABILITY OF DIFFERENCES BETWEEN 
THE MEANS OF AUIREO GROUPS FOR 
SHANK TESTS OF READING COMPREHENSION 
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TABLE XII 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE SHANK TESTS 
OF READING COMPREHENSION 


Groups Compared ow os 
0 oO 
Group 1 Group 2 Group 1 Group 2 
1A 1C 78. 69 75. 20 
1A 3A 78. 69 57.28 
1A 3C 78. 69 65. 20 
1C 3A 75. 20 57.28 
1C 3C 75.20 65. 20 
3A 3C 57. 28 65. 20 


66. 70 68. 62 


by a minus sign pre 


Difference  S. E. > * Chances 
between diff. S. E. in 
means diff. 100 
3. 49 3.32 1.05 85 
21.41 3.00 7.13 100 
13. 49 3.14 4.29 100 
17.92 3.17 5. 65 100 
10. 00 3.30 3.03 100 

7.92 2.99 —2. 65 99.6 


1. 92 2.06 —0. 93 82 


Cc 
* The first one pemee the trait to a greater degree than the second group except where indicated 
ing the ratio of the difference to its standard error, when the reverse is true. 
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possess markedly superior ability to read 
comprehendingly. 

Further inferences that may be drawn from 
the data of this study are presented below. 


1. The possession of differing degrees of 
reading skill appears to have but little, if 
any, relationship to variation from predicted 
scholastic records. The chances are only 
about eighty-three in one hundred that there 
is a difference greater than zero between the 
total positive (A) and total negative (C) 
groups, in favor of the latter. There was a 
relatively small difference of 1.93 between 
their means. 


2. The chances are but eighty-five in one 
hundred that good positive deviates (1A) 
excel the good negative deviates (1C) on 
this measure. 


3. Poor negative deviates (3C) are almost 
reliably superior (99.6 chances in 100) to 
poor positive deviates (3A). 


4. Relatively high reading comprehension 
appears to be a significant characteristic of 
students of good promise. However, no con- 
vincing evidence is available to prove that it 
constitutes a factor which distinguishes the 
positive devates (A) from the negative devi- 
ates (C). Had this been the case, then the 
poor negative deviates (3C), with better 
reading ability than the poor positive devi- 
ates (3A) to begin with, should have obtained 
better grades in colleve. It is obvious that 
the opposite was true. Some other qualities 
than reading comprehension must have been 
influencing the poor positive deviates (3A) 
to exceed expectancy, especially in view of 
the fact that their percentile scores on the 
American Council on Education Psychological 


UNPREDICTED SCHOLASTIC ACHIEVEMENT 


173 


Examination are reliably inferior to those of 
the poor negative deviates (3C).' 

Language Mechanics and _ Literature— 
Sones—Harry High School Achievement Test, 
Part I. This test contains sections which 
represent a sampling of several aspects of 
English including grammatical constructions, 
word meanings, abbreviations and prefixes, 
foreign phrases, reading comprehension, and 
literary forms, authorship, characters, pass- 
ages, and themes. It therefore tests for liter- 
ary knowledge, reading skills, and language 
usage. 

The observations concerning 
secured from this test follow: 


1. It is to be expected that the findings 
with reference to group differences would 
approximate those recorded in connection 
with the Barrett—Ryan English Test and the 
Shank Reading Tests. An examination of 
Table XIII and Chart 6 in connection with 
Charts 4 and 5 will show that this expectation 
is correct. 

2. There are no completely significant dif- 
ferences between the positive (A) and nega- 
tive deviates (C) as a whole. 

3. Reliable differences are present only be- 
tween students of good promise and students 
of poor promise for both the good positive 
deviates and the good negative deviates (1A 
and 1C) are definitely superior to both the 
poor positive (3A) and the poor negative 
deviates (3C). 

4. Some indication of a possible superiority 
of the good positive deviates (1A) over the 


* By referring to Table IV it will be noted that the mean 
American Council percentile scores of the poor negative devi- 
ates (3C) is 33.47, and that for the poor positive deviates 
(3A) it is 21.19. The difference between the means is 12.28 
which represents a ratio of the difference to its standard error 
of 3.20, and therefore, practical reliability. 


the data 


TABLE XIII 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE SONES—HARRY HIGH SCHOOL 
ACHIEVEMENT TEST, PART I—LANGUAGE AND LITERATURE 


Groups Compared — 
Oo 


0 
Group 1 Group 2 


84.41 . 78 
84. 41 
84.41 
80. 78 
80.78 
52.77 
68. 21 


Mean 
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Group 1 Group 2 


* The first group 
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ssesses the trait to a greater degree than the second group 
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S. E. D> * 
diff. S. E. in 
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0.77 77 
8.18 100 
7.40 100 
6. 34 100 
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—0. 73 76 
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31. 64 
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CHART 6 


RELIABILITY OF DIFFERENCES BETWEEN 
THE MEANS OF FP4IRED GROUPS FOR 
Sonés-Harer ACHIEVEMENT TésT 

LANGUAGE AND L/TERATURE 
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good negative deviates (1C) (77 chances in 
100) and of the poor negative deviates (3C) 
over the poor positive deviates (3A) (about 
66 chances in 100) is likewise noted. These 
differences are considerably less significant 
than they were on the Shank and the Barrett— 
Ryan tests. 

5. The total positively deviating population 
(A) is indicated as possibly better than the 
total negative group (C) (83 chances in 100) 
on the Sones—Harry test and on the Barrett— 
Ryan test (69 chances in 100), while the 
reverse is true on the Shank tests (83 chances 
in 100). These tendencies may be interpreted 
as indicating that there is more difference be- 
tween groups in the language phases of these 
tests than in the knowledge aspects. All tests 
demonstrate completely reliable differences 
only between students of good promise and 
students of poor promise, although the good 
positive deviates (1A) are nearly statistically 
better than the good negative deviates (1C) 
on the Barrett—Ryan test (98 chances in 100) 
and the poor negative deviates (3C) are 
approximately better than the poor positive 
deviates (3A) (93 chances in 100). 
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6. It is not reasonable to infer that the 
various aspects of language usage account for 
the variation of students’ performance from 
that predicted for them, especially in view 
of the findings which show consistently, 
although not with complete reliability, that 
the group of good positive deviates (1A) is 
superior to its corresponding group of nega- 
tive deviates (1C) when the exact opposite 
is true for the poor negative deviates (3() 
on all three tests involving language usage. 
Other factors such as motivation, environ- 
mental influences, personal limitations and 
adaptations, may be contributing to the rec- 
orded variation between predicted perform- 
ance and actual achievement. 

General Mathematics—Sones—Harry High 
School Achievement Test, Part II. Analysis 
of group achievement in mathematics as meas- 
ured by the Sones—Harry Test resulted in the 
following observations: 

1. The superior students (1A and 1C) 
possess superior ability in mathematics. 
Whether it is the mastery of mathematics 
or the possession of those qualities which 
facilitate such mastery that differentiates the 
superior student from the student of lesser 


achievement is not shown in this study. In- | 
vestigators report a positive relationship | 


between mathematical ability and academic 
adjustment represented by zero order coeffi- 
cients of correlation varying around a median 
of about 0.40. 

Segel* surveyed the investigations concerned 
with mathematics ability and subsequent 
general college scholarship. He reports coeffi- 
cients of correlations of 0.12 and 0.42 found 
by Remmers and Stoddard, respectively, when 
the mathematics aptitude section of the Iowa 
Placement Examination was used, and coeffi- 
cients of 0.38 and 0.35 by Brown and by 
Stoddard, respectively, when the mathematics 
training section of the same test was applied. 
He reports also that Dvorak and Salyer 
found a coefficient of 0.58 when both sections 
of the same test were combined. 


Douglass® found a correlation coefficient of 
0.44 between high school marks in mathe- 
matics and freshman grade point average. He 
reports® that Brammel and that Lauer and 
Evans found zero order coefficients of corre- 


*Segel, Prediction of Success in College, p. 62. 


5 Douglass, The Relation of High School Preparation and 
Certain Other Factors to Academic Success at the Universit) 
of Oregon, p. 25. 

® Loc. cit. 
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TABLE XIV 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE SONES—HARRY HIGH SCHOOL 
ACHIEVEMENT TEST, PART II, MATHEMATICS 


Groups Compared — — 
Oo 0 

Group 1 Group 2 Group 1 Group 2 
1A 1C 44.00 33.31 
1A 3A 44.00 24.05 
1A 3C 44.00 26. 69 
1C 3A 33.31 24.05 
1C 3C 33.31 26. 69 
3A 3C 24.05 26. 69 

A C 32. 47 32. 76 


Difference S.E. D* Chances 
between diff. 2 aaa in 
means diff. 100 

10. 69 3. 58 2.99 99.9 
19.95 3.31 6. 03 100 
17.31 3.19 5. 42 100 
9. 26 3.05 3.03 100 

6. 62 2.93 2. 26 98.7 
2.64 2. 59 —1.02 84 
0.29 1.97 —0.15 56 


* The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART 7 


Rewagiity OF DIFFERENCES BETWEEN 
THE MEANS OF PAIRED GROUPS FOR 
SONES - HARRY ACHIEVEMENT TesT 
PareT I] MATHEMATICS 
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lation of 0.39 and 0.47, respectively, in 
similar investigations. . 

2. The data (Table XIV and Chart 7) 
definitely demonstrate a reliable superiority 
of the good positive deviates (1A) over all 
other groups. 

3. The good negative deviates (1C) are 
nearly (98.7 chances in 100) reliably superior 
to the poor negative deviates (3C). They 
are definitely better than the students of poor 





promise whose work was better than predicted 
(3A), since they obtained an average of 33.31 
as compared with a mean of 24.05 for the 
inferior group (3C). 

4. For the students of poor promise who 
fell short of expectancy (3C) there is noted 
a tendency (84 chances in 100) to do better 
work in mathematics than similar students 
whose records exceeded predictions (3A). 

5. As a whole, there is no significant dif- 
ference between the knowledge of mathe- 
matics of those students who did better work 
than expected (A) and the knowledge of 
mathematics of those who did poorer work 
than was predicted for them (C). 

Mathematical Reasoning — Progressive 
Mathematics Tests, Advanced, Form A, Test 
3—Mathematical Reasoning. The only dif- 
ference of any importance between the find- 
ings for the mathematical reasoning section 
of the Progressive tests and the mathematics 
section of the Sones-Harry test lies between 
the good positive and good negative groups 
(1A and 1C). This statement may be verified 
by comparing Table XV and Chart 8 with 
Table XIV and Chart 7. For the Sones-Harry 
test there was found a critical ratio of the 
reliability of the difference of the means of 
2.99, which indicates practical statistical cer- 
tainty of a difference greater than zero. On 
the other hand, in the reasoning section of 
the Progressive test the critical ratio was 
2.16, which indicates approximately ninety- 
eight chances in one hundred that there is a 
reliable difference. For practical purposes it 
is justifiable to assume that both tests are 
relatively comparable in all of the differences 
that are evidenced. The superiority of stu- 
dents of good promise (1A and 1C) regard- 
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TABLE XV 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE PROGRESSIVE MATHEMATICS 
Tests, Form A, TEST 3—MATHEMATICAL REASONING 


Groups Compared ss ae 
+8) Oo 

Group 1 Group 2 Group 1 Group 2 
1A 1C 46.31 42.47 
1A 3A 46.31 36. 05 
1A 3C 46.31 37.78 
1C 3A 42.47 36. 05 
1C 3C 42.47 37.78 
3A 3C 36. 05 37.78 

A C 40. 52 40. 84 


Difference S. E. »* Chances 
between diff. S. E. in 
means diff. 100 

3. 84 1.77 2.16 98.3 
10. 26 1. 69 6. 07 100 
8. 53 1.71 5. 00 100 
6. 42 1. 84 3.49 100 
4. 69 1. 86 2. 53 99.4 
1.73 1.77 —0. 98 83 
0. 32 1.18 —0. 27 60 


. * The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART @ 


RewiAgitity OF DIFFE@ENCES BETWEEN 
THE MEANS OF PAIRED GROUPS FOR 

PROGRESSIVE MATHEMATICS TEST, ADVANCED 

form A, TEST 3 - MATHEMATICAL REASONING 
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i less of deviation, over poor students (3A and 
3C) is the dominant fact observed, just as 
it was on the Sones-Harry test. 
Mathematical Computation —Progressive 
Mathematics Tests, Advanced Form A, Test 
i 3 4—Mathematical Computation. 1. For this 
: a test differences between groups are found 
similar to those observed in connection with 
the Sones—Harry Test—Part II, and the 
reasoning section of the Progressive tests. 


This statement may be verified by comparing 
Table XVI and Chart 9 with Tables XIV 
and XV and Charts 7 and 8. 


3. The difference between the means (5.69) 
in favor of the good positive deviates (1A) 
over the good negative deviates (1C), is less 
pronounced on the computation section of the 
Progressive test than on the Sones—Harry test 
where the difference between the means is 
10.69. However, the difference is greater for 
computation than it is for reasoning (3.84). 

3. For the poor negative deviates (3C) 
on all three tests some superiority over the 
poor positive deviates (3A) is indicated 
Only on the Sones—Harry test is reliability 
approximated with eighty-four chances in one 
hundred that the difference is greater than 
zero. 

4. No significant differences between the 
total positive deviates (A) and total negatives 
(C) are found on any one of the three mathe- 
matics tests. 

Wagner’ reported a more marked, although 
not a completely reliable relationship between 
mathematics and deviation from predicted 
college averages than was found in this in- 
vestigation. Her data indicated a superiority 
of positive deviates over negative deviates by 
better than ninety-nine chances in one hun- 
dred for boys. However, the negatively devi- 
ating girls were indicated as superior by sixty- 
nine chances in one hundred. Had Wagner 
combined the boys and girls and computed 
differences accordingly it is probable that 
findings of the two investigations would have 
been in closer apparent agreement. Moreover, 
for neither investigation are the reliabilities of 
the differences complete enough to warrant the 

* Wagner, op. cit., p. 198. 
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TABLE 


DIFFERENCES BETWEEN THE MEANS OF PAIRED 
Tests, Form A, Test 4 





Groups Compared = —_ 
0 o 

Group 1 Group 2 Group 1 Group 2 
1A 1C 62.19 56. 50 
1A 3A 62.19 46.95 
1A 3C 62.19 47.92 
1C 3A 56. 50 46.95 
1C 3C 56. 50 47.92 
3A 3C 46.95 47.92 

A C 53.31 53.31 


* The first group 


UNPREDICTED SCHOL. 


ISTIC ACHIEVEMENT 17 


~ 


XVI 


GROUPS ON THE PROGRESSIVE MATHEMATICS 


MATHEMATICAL COMPUTATION 


Difference 5S. E. D * Chances 
between diff. S_E. in 
means diff. 100 
5. 69 3.27 1.7 96 
15.24 3.04 5.02 100 
14. 27 2. 69 §. 31 100 

9.55 3. 58 2. 67 99.6 

8.58 3.29 2.61 99.5 
0.97 3.06 —0. 32 62 
0.00 1.93 0.00 0 


the trait to a greater degree than the second group except where indicated 


by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART 9 


ReiABiLiTY OF DIFFERENCES BETWEEN 
THE MEANS OF PAIRED GROUPS FOR 

Deoceessive MatHwemarics Test, ADVANCED 

form A, Test 4- MATHEMATICAL COMDUTATION 
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conciusion that the lack of correspondence 
between the findings is due to factors other 
than sampling. 

Natural Science—Sones—Harry High 
School Achievement Test, Form A, Section 
11. t. This natural science test samples the 
subject's knowledge of many types of natural 
phenomena. From experience one would 
expect the students 


of good promise to 


demonstrate superiority over the students of 
poor promise. The data in Table XVII and 
Chart 10 confirm this view. Each of the groups 
of students of good promise (1A and 1C) 
has a reliably, or approximately reliably, 
greater mastery of scientific facts than each 
of the groups of less able students (3A and 
3C), and the students of good promise who 
exceed their predicted record (1A) give some 
evidence (83 chances in 100) of superior 
knowledge over the students of good promise 
who did not make the record expected (1C). 

2. The positive deviates (A) as a group 
demonstrate somewhat greater knowledge of 
scientific facts than the total group of nega- 
tive deviates (C). Their mean is 1.73 points 
greater, and the chances of a difference be- 
tween the groups greater than zero are about 
eighty-eight on one hundred. 

This finding compares favorably with that 
of Wagner* who reported that the direction 
of difference in science knowledge is in favor 
of the positive deviates. She found a com- 
pletely reliable difference for the boys, and 
a chance of eighty-eight in one hundred that 
the positively deviating girls are superior to 
those who vary negatively from predicted 
college averages. 

This difference may be interpreted as an 
indication of the probability of a relationship 
of a student’s possession of scientific knowl- 
edge to his academic adjustment in college. 

Observation of the practices followed in 
evaluating students’ work leads to the view 
that the possession of facts is of primary 
importance and receives marked weighting 
when professors estimate marks. In this con- 
nection, reference to Chart 6 which presents 
data concerning group achievement on the 

* Wagner, loc. cit. 
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TABLE XVII 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE SONES—HARRY HIGH SCHOOL 
ACHIEVEMENT TEST, Form A, SECTION III—NATURAL SCIENCE 


Groups Compared Mean Mean 
of of 

Group 1 Group 2 Group 1 Group 2 
1A 1C 41.01 38. 30 
1A 3A 41.01 31.73 
1A 3C 41.01 32.10 
1C 3A 38. 30 31.73 
1C 3C 38. 30 32.10 
3A 3C 31.73 32.10 

A C 36. 31 34. 58 


Difference SS. E. >-* Chances 
between diff. 3k. in 
means diff. 100 

2.71 2.84 0.95 83 
9.28 2.36 3.93 100 
8.91 2.61 3.41 100 
6. 57 2. 37 2.77 99.7 
6. 20 2. 62 2. 36 99 
0.37 2.09 —0.18 57 
1.73 1.45 1.19 88 


* The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART /0 


Rei/asi./Tr OF DIFFERENCES BETWEEN 
THE MEANS OF PAIRED GROUPS FOR 


SONES-HARRY HIGH SCHOOL ACHIEVEMENT 
JEsT Form A, SECTIONII|- NATURAL SCIENCE 
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language and literature test, will indicate also 
a tendency of the total positively deviating 
group (A) to demonstrate greater knowledge 
of factual material than the negatively devi- 
ating group as a whole (C). This difference 
is viewed as having some significance since 
it cannot be ascribed as a natural concomitant 
of intelligence, for as will be noted by refer- 
ring to Table IV, the negatively deviating 


group has an average percentile score 6.28 
points greater than the positively deviating 
group. 

Social Science—Sones—Harry High School 
Achievement Test, Form A, Section IV. 1. 
The data presented in Table XVIII and Chart 
11 show some superiority of all positive de- 
viates (A) over all negative deviates (C) in 
social science knowledge, although in no case 
is the difference between the groups com- 
pletely reliable. There are ninety-four chances 
in one hundred that the total positive deviates 
have more knowledge of this type than the 
total negative deviates. This finding compares 
favorably with that reported by Wagner® for 
history. She found a difference favoring both 
the boys (99.8 chances in 100) and the girls 
(98.5 chances in 100) whose college averages 
exceeded those predicted for them. The dif- 
ference which she reported is somewhat more 
marked than that found in this investigation. 
This lack of agreement is probably due to the 
difference in the method of arranging the 
groups. She included in the deviating groups 
students whose obtained college grade varied 
from the mean by one or more probable errors 
of the distribution, while in the present study 
the deviates were classified as those students 
who varied by one-half of a standard devia- 
tion, or .74 of a probable error, of the distri- 
bution. Thus, Wagner’s deviating groups con- 
tained more extreme cases than the groups in 
the present investigation. Consequently, a 
greater difference in the characteristics of the 
positive (A) and negative (C) deviates is to 
be expected. 

2. There are about ninety-nine chances in 
one hundred that the good positive deviates 


* Wagner, op. cit., p. 198. 
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TABLE XVIII 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE SONES—HARRY HIGH SCHOOL 


ACHIEVEMENT TEST, ForM A, 


Groups Compared — as 
o 0 
Group 1 Group 2 Group 1 Group 2 
1A 1C 63.99 55. 30 
1A 3A 63.99 45.27 
1A 3C 63.99 41.87 
1C 3A 55. 30 45. 27 
1C 3C 55. 30 41.87 
3A 3C 45. 27 41.87 
A Cc 52. 25 48.74 


Part IV—SocitAL SCIENCE 


Difference 8. E. D* Chances 
between diff. 5.5. in 
means diff. 100 
8. 69 3. 80 2.28 89 
18. 72 3.41 5. 48 100 
21.12 3. 52 6. 29 100 

10. 03 3.74 2.68 99.6 
13. 41 3. 84 3. 50 100 

3.40 3.45 0.99 99.5 
3.51 2.19 1. 60 94 


* The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART // 


Rewiasitity OF DIFFERENCES BETWEEN 
THE MEANS OF L4/RED GROUPS FOR 
Soneés-HARRY HIGH SCHOOL ACHIEVEMENT 
Test Foam A, SECTION IV- SOCIAL SCiENCE 


TOTAL 
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(1A) know more social facts than good nega- 
tive deviates (3A), and eighty-four chances 
in one hundred that the poor positive deviates 
(3A) have such knowledge superior to the 
poor negative deviates (3C). 

3. The findings also strongly indicate that 
students predicted as better than average (1A 
and 1C) have a greater mastery of social 
science information than the students of poor 


promise (3A and 3C). It will be noted that 
both the positive and the negative groups of 
good students (1A and 1C) obtained mean 
scores on the social science test of 63.99 and 
55.20, respectively, while the poor negative 
deviates (3C) received an average score of 
41.87, and the poor positive deviates (3A) 
averaged 45.27. 

These observations tend to confirm the view 
expressed in connection with natural science, 
namely, that there is some likelihood that a 
knowledge of facts tends to contribute to a 
student’s academic adjustment in college, at 
least in so far as adjustment is measured by 
professors’ marks. 

General Academic Achievement—Sones— 
Harry High School Achievement Test, Form 
A—(Total Score). 1. Differences between 
the total scores made by the various groups 
on the Sones—Harry test are given in Table 
XIX and Chart 12. When Chart 12 is com- 
pared with Charts, 6, 7, 8, 10, and 11, which 
present graphically the differences in achieve- 
ment on the tests of knowledge of subject 
matter primarily, a marked similarity between 
all the charts is noted. This likeness was to 
be expected because of the relationship of the 
sub-divisions of the Sones—Harry test to the 
total score of the test, and because of the 
measurement of like or similar achievements 
by the mathematical reasoning section of the 
Progressive Mathematics test and the mathe- 
matics section of the Sones—Harry test. The 
total score of the Sones—Harry test may there- 
fore be presented as a composite measure of 
the attainments of groups in knowledge fields. 

2. Above average groups (1A and 1C) are 
markedly superior to the below average groups 
(3A and 3C) in scholastic achievements as 
measured by the Sones—Harry test. Table 
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TABLE XIX 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE SONES—HARRY HIGH SCHOOL 
ACHIEVEMENT TEsT, Form A, (TOTAL Score) 


Groups Compared Mean Mean 
of of 

Group 1 Group 2 Group 1 Group 2 
1A 1C 231.18 207. 07 
1A 3A 231.18 151. 22 
1A 3C 231.18 155. 29 
1C 3A 207. 07 151. 22 
1C 3C 207. 07 155. 29 
3A 3C 151. 22 155. 29 

A C 187.14 180. 60 


Difference §S. E. > Chances 
between diff. $5. in 
means diff. 100 

24.11 10. 43 2.31 98.9 
79. 96 8. 67 9.23 100 
75. 89 9.12 8. 32 100 
55. 85 9. 52 5. 87 100 
51.78 9. 93 5. 21 100 
4.07 8. 06 —0. 51 69 
6. 54 6.45 1.01 84 


* The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART /2 


Retiagi.ity oF OlrreRences BETWEEN 
THE MEANS OF F4/RED GROUDS FOR 
SONES- HARRY HIGH SCHOOL ACHIEVEMENT 
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XIX shows the averages of 231.18 and 207.07 
for the former, and 151.22 and 155.29 for 
the latter. 

3. Good positive deviates (1A) are in- 
dicated as having more knowledge than good 
negative deviates (3A) with a nearly reliable 
difference (98.9 chances in 100) between the 
means of the two groups of 24.11. 

4. There is no significant difference be- 
tween the positive and negative deviates 


among the students of poor promise (3A and 
3C). 

5. A probable superiority of all positive 
deviates (A) over the total negative deviates 
(C) is indicated (84 chances in 100). There 
is a difference between their means of 6.54. 
This difference may be interpreted as signi- 
fying that subject matter knowledge probably 
influences students to do work in college bet- 
ter than that predicted for them. However, 
when Table XIX and Chart 12 are compared 
with Table XX and Chart 13, which present 
the differences between the groups on their 
original high school records, this view seems 
hardly tenable, for a striking similarity of the 
difference is noted. The groups between which 
the differences are completely reliable, and 
those which are nearly reliable, correspond 
exactly. Moreover, while on the Sones—Harry 
test the poor negative deviates (3C) are in- 
dicated as excelling the poor positive deviates 
(3A) with approximately sixty-nine chances 
in one hundred of there being a difference 
greater than zero, in terms of high school 
record the chances are eighty-four in one hun- 
dred that this is true. Furthermore, on the 
Sones—Harry test the total group of positive 
deviates (A) is indicated as superior to the 
total group of negative deviates (C), by 
eighty-four chances in one hundred, while the 
high school record shows seventy-nine chances 
in one hundred that the same is true. On the 
basis of these comparisons it seems more 
justifiable to postulate the view that the dif- 
ferences that are evidenced are more a func- 
tion of the procedure followed in the original 
organization of the groups than they are char- 
acteristics which influence variations in pre- 
dicted college performance. 
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TABLE XX 
DIFFERENCES BETWEEN THE MEANS OF THHE SuB-GROUPS ON HIGH SCHOOL AVERAGES 
Groups Compared Mean Mean Difference S. E. D* Chances 
of of between diff. ..E. in 
Group 1 Group 2 Group1 Group 2 means diff. 100 
1A 1C 3.49 3.28 0. 21 0.08 2.75 99.7 
1A 3A 3.49 2.16 1.33 0.08 17.15 100 
1A 3C 3.49 2.23 1. 26 0. 07 18. 73 100 
1C 3A 3.28 2.16 1.18 0.08 14.47 100 
1C 3C 3.28 2.23 1.05 0.07 15. 65 100 
3A 38C 2.16 2.23 0. 07 0.07 —1.02 84 
A Cc 2.78 2.72 0. 06 0.07 0.81 79 


* The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 
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II. Non-AcaDEMIC MEASURES 


Home Adjustment—Bell Adjustment In- 
ventory. 1. The findings of this investigation 
reveal no completely reliable differences be- 
tween any of the groups on the home adjust- 
ment section of the Bell Adjustment Inven- 
tory (Table XXI and Chart 14). However, 
one striking fact is indicated, namely, that 
with nearly complete reliability the poor nega- 
tive deviates (3C) are superior in their home 
adjustment to any of the other deviating 


groups. The differences between the mean of 
this group and of each of the other three 
groups when divided by their respective 
standard errors are 2.29, 2.75, and 2.44, a 
result which indicates about ninety-nine 
chances in one hundred that the differences 
are greater than zero. 


2. There is some suggestion that the good 
positive deviates (1A), experience better (83 
chances in 100) home adjustment than the 
good negative deviates (1C). 

3. The negative deviates as a whole (C) 
tend to be better adjusted than the positive 
deviates (A) although the data are not con- 
clusive in this regard (80 chances in 100). 

These observations lead to the inference 
that adjustment to home conditions influences 
variation from predicted performance, especi- 
ally for the students of poor promise. The 
data indicate that students with inferior home 
adjustment, as measured by this test tend to 
improve the quality of their scholastic attain- 
ment, while those with better home adjust- 
ment tend to take things easier in college, or 
at least accomplish less than expected in terms 
of their aptitude and previous achievement 
in school. 

Numerous investigations have been made to 
discover the relationship between home con- 
ditions and academic achievement. In a sur- 
vey of the literature reporting some of the 
major studies in this connection Sarbaugh’® 
points out the conflicts in the different find- 
ings with reference to the importance of vari- 
ous influences in determining the nature of an 
environment favorable to academic achieve- 
ment. In her own investigation’ of high 


% Mary E. Sarbaugh, “Effect of Home Surroundings on 
Academic Achievement,”’ Studies in Articulation of High 
School and College (Caivensiy of Buffalo Studies, XIII; 
Buffalo, N. Y.: University of Buffalo, 1936), pp. 245-276 


" [bid., pp. 275-276 
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TABLE XXI 


DIFFERENCES BETWEEN THE MEANS* OF PAIRED GROUPS ON THE BELL ADJUSTMENT INVENTORY 
PART A—HOME ADJUSTMENT 


Groups Compared Mean Mean Difference S. E. D *t Chances 
of of between diff. S. E. in 
Group 1 Group 2 Group 1 Group 2 means diff. 100 
1A 1C 8.16 9.47 1.31 1.40 0.94 83 
1A 3A 8. 16 8.45 0.2 1.16 0.25 60 
1A 3C 8.16 5. 56 2. 60 1.14 —2.29 98.9 
1C 3A 9.47 8.45 1.02 1.44 —0.71 76 
1C 3C 9.47 5. 56 3.91 1.42 —2.75 99.7 
3A 3C 8.45 5. 56 2.89 1.18 —2.44 99.3 
A Cc 8. 00 7.36 0. 64 0.75 —0. 85 80 


* A low score signifies good adjustment. 
t The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART /4 own investigation'® of college freshmen he 
found no reliable relationships between home 
ReitAbitity OF DIFFERENCES BETWEEN surroundings and scholastic success. 


THE MEANS OF PAIRED GROUPS FOR 
ati BELL ADYUSTMENT INVENTORY 
i HOME ADJUSTMENT 


Shuttleworth,’* in a study of college fresh- 
men, found a slight degree of relationship be- 
tween favorable intellectual and cultural home 





backgrounds and scholastic success. On the 
ti TOTAL. other hand Wagner" reports the opposite. She 











3A <—2— lA A found more favorable home backgrounds for 
& \ 71 the students who did not do work of as high 
i mn % a” 4 \ calibre as that predicted for them by their 
ii ly \ i, * as previous records. This conclusion agrees in 
i ya i Foy | general with the findings of this investigation, 
4 ‘oi ie especially for the students of poor promise, 
; | : 47 . although this study used the student’s check- 
: Uy it list of statements describing home conditions 
| <q’ no rather than the actual objective reports of the 
i} 5C fener OE. oe 4 ra home backgrounds which Wagner used. 
i} — A possible interpretation of the favorable 
5 relationship of poorer home adjustment to 
; favorable variation from predicted college 
work is given by Wagner in the following 
’ 6am statement: 
B amr Both because of their wish to obviate 
oe LESS THAN/ —- — present lack of cultural aspects in their 
6) bs. a io (---5 home environment and because of their 
Bie aa 3.1 AND OVER —— greater awareness of their educational 
tt GREATER THAN opportunity, these students are more likely 
Bit to make the most of their time investment.*’ 
school seniors she found some differences in Health Adjustment—Bell Adjustment In- 
the family backgrounds favorable to the supe-  , 4 ory. 1. Health adjustment does not 
a A ae 0 BB ond pmo wail they were appear to be a characteristic which differenti- 
Harris‘? in his survey of the literature con- pore Ten co enat’ giliaito = loan 
, cerning investigations of the relation of factors se . . / 
poeta be with the home and academic adjust- On ce ss negative deviates. Table XXII 
\ e ments reported conflicting findings. In his = mad. 5: 
} ™® Daniel Harris, “The Relation to College Grades of Some 4 Shuttleworth, op. cit., pp. 431-432. 
Factors Other Than Intelligence.’ Archives of Psychology, 48 Wagner, op. cit., p. 234. 


XX, No. 131 (July, 1931), 13-14. 16 Ibid., p. 208. 
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TABLE XXII 


DIFFERENCES BETWEEN THE MEANS* OF PAIRED GROUPS ON THE BELL ADJUSTMENT INVENTORY, 
PART B—HEALTH ADJUSTMENT 


Groups Compared Mean — 
of oO 
Group 1 Group 2 Group 1 Group 2 

1A 1C 6.93 7. 82 
1A 3A 6.93 7.35 
1A 3C 6.93 6.28 
1C 3A 7.82 7.35 
1C 3C 7.82 6. 28 
3A 3C 7.35 6. 28 
A C 7.35 6.77 


* A low score signifies good adjustment. 


Difference S. E. D *t Chances 
between diff. i. in 
means diff. 100 
0.89 1.12 0.79 7 
0.42 0.95 0.44 66 
0. 65 1.01 —0. 65 74 
0.47 1.12 —0. 42 65 
1. 54 Pe ys 1. 32 90 
1.07 1.01 1. 06 85 
0.58 0. 56 —1.03 84 


+t The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART /5 


Rewiagitity of DIFFERENCES BETWEEN 
THE MEANS OF PAIRED GROUPS FOR 
Beit AQvuusTMenT INVENTORY - 
HIEALTH ADUUSTMENT 


TOTAL 
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2. The data indicate a health adjustment 
of the poor negative deviates (3C) superior 
to that of the other deviating groups with 
chances in one hundred of seventy-four, 
eighty-five, and ninety for the three groups. 


3. The good positive deviates (1A) seem 
to be better adjusted (79 chances in 100) in 
the matter of health than the good negative 
deviates (1C). 


4. The health adjustment of the total 
positively deviating population (A) appears 
to be poorer (85 chances in 100) than that of 
the total negatively deviating group (C). 

5. The lack of health adjustment may be 
a factor which influences the achievement of 
a quality of college work superior to that 
expected, especially for the students of poor 
promise. The students in the best adjusted 
group appear to have made the poorest col- 
lege records. However, the findings are not 
conclusive because of the low reliabilities of 
the differences. 

Investigators report conflicting findings 
concerning the relationship between physical 
condition or the possession of physical defects 
and school marks. This conflict is shown in 
Harris”* review of the major studies that had 
been reported. In his own investigation he 
found no reliable relationship between physi- 
cal defects and scholastic attainment. 

Wagner’® in her study of the variations 
from predicted achievement of college students 
reported that the students who made records 
better than expected estimated their health as 
“average” as contrasted with the ranking of 
“above average” for the less successful. On 
the other hand, Eckert,’® in her study of 
superior and inferior college students, by the 
use of student self-judgments found no per- 
ceptible difference between the two groups in 
physical fitness. 

The variations in findings are probably due 
to the inherent weaknesses of the question- 
naire method of securing data and to the con- 


™ Harris, op. cit., pp. 12-13. 

one oP. cit., p. 234. 
_ ® Ruth E. Eckert, “Who is the Superior Student?” Studies 
in Articulation of High School and College (University of 
Buffalo Studies, IX; Buffalo, N. Y.: University of Buffalo, 
1934), p. 35. 
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TABLE XXIII 


DIFFERENCES BETWEEN THE MEANS* OF PAIRED GROUPS ON THE BELL ADJUSTMENT INVENTORY, 
PART C—SOcIAL ADJUSTMENT 


Groups Compared 


Mean Mean 





Difference S. E. > 4 Chances 
of of between diff. Ss in 
Group 1 Group 2. Group1l Group 2 means diff. 100 
1A 1C 14. 64 13. 76 0. 88 1. 85 —0. 48 68 
1A 3A 14. 64 14. 00 0. 64 1.88 —0. 34 64 
1A 3C 14. 64 13.28 1.36 1.79 —). 76 77 
1C 3A 13. 76 14. 00 0.24 1. 82 0.13 55 
1C 3C 13. 76 13. 28 0.48 1.7 —0.28 61 
3A 3C 14. 00 13. 28 0.72 1.76 —0. 41 65 
A C 14. 22 14.03 0.19 1. 02 —0. 20 58 
* A low score signifies good adjustment. 
+ The first a the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 
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sequent basing of conclusions upon data with 
varying degrees of reliability. 

Social Adjustment—Bell Adjustment In- 
ventory. A study of Table XXIII and Chart 
16, which present the findings concerning the 
relative social adjustment of the various 
groups, reveals no appreciable differences be- 
tween any of the groups. Apparently social 
adjustment as measured by the Bell Adjust- 
ment Inventory is not a function of either 





scholarship or of variations from predicted 
scholastic achievement. 

This finding is similar to that of Wagner*’ 
who reports no appreciable difference between 
deviating groups on self-estimates of college 
freshmen on characteristics comparable to 
those measured by the social adjustment sec- 
tion of the Bell Adjustment Inventory. 


Emotional Adjustment—Bell Adjustment 
Inventory. 1. No reliable differences between 
the various groups in the quality of emotional 
adjustment as measured by the Bell Adijust- 
ment Inventory were found (Table XXIV 
and Chart 17. 

2. There is a strong indication that, of all 
groups studied, the students of good promise 
who did not make the records predicted for 
them (1C) had the least satisfactory emo- 
tional adjustment. The chances in one hun- 
dred that the differences are greater than zero 
vary from ninety-three to ninety-six and one- 
half. Within these limitations of reliability 
the inference is made that students of good 
promise who obtain college records inferior 
to those expected (1C) are not as well ad- 
justed emotionally as those students who are 
predicted to do good work but exceed expec- 
tations (1A). Consequently, it seems probable 
that in emotional adjustment there is found 
a quality that influences activity that leads 
students of good promise to greater achieve- 
ment than expected, and in the lack of such 
adjustment there is found a quality associated 
with the failure of students of good promise 
to obtain college averages predicted for them. 
The observation that both groups of subjects 
show some superiority over the negatively 

* Wagner, op. cit., p. 229. 
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emotional adjustments with low scholarship. 
As Wagner** suggests, the relative lack of 
socio-economic security may be serving to 
stimulate the development of secondary and 
tertiary drives in an effort to compensate. 
Conversely, good home adjustment would not 
give rise to such compensatory motivation. 


e. The poor positive deviates (3A), while 
tending to be more poorly adjusted in their 
home, health, and emotional relations than 
the less successful below-average students 
(3C), appear at the same time to be better 
adjusted in these ways than the less success- 
ful students of good promise (1C). This 
finding suggests that favorable adjustments 
of these types are associated with students 
having less academic promise at the beginning 
of their college careers (3A and 3C). 


3. A second striking observation is noted 
concerning the findings on the Bell Adijust- 
ment Inventory, namely, the similarity of the 
relationship between the paired groups on 
the home adjustment, health adjustment, and 
emotional adjustment sections of the Bell 
questionnaire. If a relationship other than 
chance is assumed, it may be inferred that 
there is an inter-relationship between the 
qualities that are measured by the three dif- 
ferent tests. A classification of types of 
adjustment such as Bell has used may be 
even more in the nature of an administrative 
device than a valid test founded on different 

™ Wagner, op. cit., p. 236. 
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discrete factors. Perry** in his investigation 
of group factors in adjustment questionnaires 
cautions against the assignment of terms to 
indicated factors on the basis of inspection 
and rationalization. He concludes that name 
assignment to such a factor is arbitrary and 
depends on the meaning of the term as viewed 
by the person who designates it. 

Choice of Life Career at Time of College 
Entrance. For the purpose of this investiga- 
tion the only part of the vocational question- 
naire used was the question that asked 
whether or not a very definite choice of a 
life career had been made prior to registra- 
tion. A tabulation of the results and a 
graphic representation of the data are given 
in Table XXVI and Chart 18. Approxi- 
mately the same percentage of the total posi- 
tively deviating group (A) and the total 
negatively deviating group (C) had made 
their vocational decision, the percentages 
being 63.83 and 61.46, respectively. A dif- 
ference in percentage of only 1.85 was found 
between the good positive group (1A) and 
the good negative group (1C). There was a 
more marked difference between the poor 
positive group (3A) and the poor negative 
group (3C), namely, 6.58 in favor of the 
former. A noteworthy fact is the great dif- 
ference between the students of good promise 
(1A and 1C) and the students of poor prom- 
ise (3A and 3C), for 52.13 per cent of the 
total group of students of good promise had 

% Perry, op. cit., p. 79. 


TABLE XXVI 


PERCENTAGES OF POPULATION OF SuUB-GRouUPS WHO HAD MADE VOCATIONAL CHOICES AT TIME 
OF ENTRANCE IN COLLEGE 


Total 
Group Number 

1A 35 
1B 41 
1C 31 
i 34 
2B _. 66 
2C _. 50 
3A. 42 
3B. 47 
i ae 36 
A Total positive deviates . ill 
B_ Total non-deviates_____. 154 
C Total negative deviates 117 
1 Total good students. - 107 
2 Total average students ; 150 
3 Total poor students... ___. : 125 

ES ee eee 382 


Number Number Percent 
who who who 
answered had made had made 
question choice choice 
32 16 50. 00 
35 19 54.29 
27 14 51. 85 
29 18 62. 07 
53 26 49. 06 
40 24 60. 00 
33 26 78.79 
39 20 51.28 
29 21 72. 41 
94 60 63. 83 
127 65 51.18 
96 59 61. 46 
94 49 52.13 
122 68 55. 74 
101 67 66. 34 
317 184 58. 04 
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Wagner*® reported a tendency on the part 
of the positively deviating group to claim to 
suffer greater disturbance from scoldings and 
to worry less than the negatively deviating 
group. She suggests that: 


the social sensitivity evidenced by these 
more successful students plays a real part 
in keeping them on their academic job.** 


Adjustment as Measured by the Bell Ad- 
justment Inventory. 1. Most striking is the 
observation that not a single completely 
reliable difference was found between any of 
the four extreme deviating groups on any one 
of the four types of adjustment purported 
to be measured by this test. This situation 
will be readily seen by referring to Charts 14, 
15, 16, and 17. A probable interpretation is 
that no significant relationship exists between 
either the quality of academic achievement 
in college or the receiving of marks above or 
below those predicted and the adjustive char- 
acteristics measured by this inventory test. 
It would therefore appear that the Bell test 
has characteristics similar to those of other 
current adjustment questionnaires, for Perry** 
in his statistical analysis of the relationship 
between the various questionnaires and 
academic attainment found no significant 
relationships with any of them.’® Further- 
more, this finding in general agrees with that 

Wagner, of. cit., p. 229. 

7 Loc. cit. 

* Raymond C. Perry, A Group Factor Analysis of the 
Adjustment Questionnaire (Southern California Education 


Monographs, 1933-1934 Series, No. 5; Los Angeles: Univer- 
sity of Southern California Press, 1934), p. 78. 

” The questionnaires which Perry used were the Laird Per- 
sonal Inventories B2 and C2, the Bernreuter Personal Inven- 
tory, Scales BI1-N, B2-S, B3-I, and B4-D, the Allport 
Reaction Study, and the Pressey X-0 Tests—Affectivity and 
Idiosyncracy. 


TABLE 
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of other investigators whose reports are sum- 
marized by Toops and Kuder in their review 
of investigations concerned with the relation 
of personality to scholastic records. They 
state: “With few exceptions, personal data 
have proved to be of little use in prognosti- 
cating college achievement.’’*° 

2. The possibility of relationship between 
the various types of adjustment measured and 
the tests used are shown in the following 
(Table XXV) statements: 

a. The means of the total negative deviates 
(C) on all four tests are smaller than the 
means for the total positive deviates (A). 
This finding suggests better home, health, 
social, and emotional adjustments for the 
subjects whose college marks fell below the 
level expected (C). The chances in one hun- 
dred that there is a difference greater than 
zero are eighty, fifty-eight, eighty-four, and 
sixty-two, respectively. 

b. Except on the social adjustment test, 
the differences between the good positively 
deviating group (1A) and the good negatively 
deviating group (1C) are in favor of the 
former. This result suggests a relationship of 
high college averages to home, health, and 
emotional adjustments. 

c. The same relationship and suggestion 
are indicated for the good positive deviates 
(1A) over the poor positive (3A) deviates. 

d. The students predicted to make rela- 
tively low scholastic records and who obtained 
lower ones (3C) showed a consistent superi- 
ority over all groups on all four tests. This 
finding may indicate an association of rela- 
tively good home, social, health, and 


* Herbert T and G. Frederic Kuder, ‘Psychological 
Tests,’’ Review of Educational Research, V (June, 1935). 223 


XXV 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE SUBDIVISIONS OF THE BELL 
ADJUSTMENT INVENTORY EXPRESSED IN NUMBER OF CHANCES IN ONE HUNDRED THAT 
DIFFERENCES ARE GREATER THAN ZERO 


Difference between Means* 





Home Social Health Emotional 
Groups adjust- adjust- adjust- adjust- 

ment ment ment ment 

Total C— 

EE ee 80 58 84 62 
SS ae 83 —68 79 96 
TERS aT es 60 —64 66 63 
Nn TS |e ea Te 98.9 77 74 53 
8C—1C___._____- lee ee 99.7 61 90 96 
Rn a nue Beuecuddencnal 99.3 65 85 65 
aaa 76 —55 65 93 


* Except where indicated by a minus sign the difference is in favor of the group listed first in each pair. 


March, 


needed 
with 1 
concern 


1. N 
dicates 
tion of 
scholas 
better 
regress 
ages al 
Howev 
in gene 
will (4 
Shank 
obtain 
ventor 
social, 
will re 
Sones- 
and or 
langua 

2. § 
lege W 
pected 
ter th: 
high o 
Harry 

Pos 
promi 
relatin 
adjust 
Sones 
on th 
both | 
of th 
on th 
a les 
Shank 
the r 
litera’ 
Schoc 
poor 

Bell 
It 
the g 
tion 
the s 
expec 
% 
show 
the t 
dicte 
3C). 








March, 1939| 


needed to throw light upon the major problem 
with which this investigation has been 
concerned. 


I. SPEcIFIC CONCLUSIONS 


1. None of the fourteen measures used in- 
dicates reliably that students, with the excep- 
tion of those classified as superior, will attain 
scholastic records in their freshman year 
better or poorer than those predicted by a 
regression equation based on high school aver- 
ages and American Council percentile scores. 
However, there are indications that students 
in general who do better work than expected 
will (a) receive relatively low scores on the 
Shank Test of Reading Comprehension, (b) 
obtain scores on the Bell Adjustment In- 
ventory which suggest poor home, health, 
social, and emotional adjustments; and (c) 
will receive relatively high total scores on the 
Sones—Harry High School Achievement Test 
and on the social science, natural science, and 
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The less successful group of students of 
poor promise (3C) were indicated as superior 
to similar students who were more successful 
(3A) by higher scores on all academic meas- 
ures excepting social science. 

They were also favored on all measures of 
adjustment. This finding is almost the reverse 
of the finding for the deviating groups of 
students of better promise (1A and 1C). 

A larger percentage of the more successful 
students of poor promise (3A) had made a 
vocational choice before entering college than 
the less successful students of poor promise 
(3C), whereas there was very little difference 
between the deviating groups of the students 
of good promise (1A and 1C\). 

4. The more successful group of students 
of good promise (1A) was shown to be 
reliably superior to the more successful group 
of students of poor promise (3A) on all 
academic measures. 


No reliable differences between these groups 





ly language and literature sections of this test. 

ly 2. Students predicted to do superior col- 
lege work (1A and 1C) may be reliably ex- 
pected to make academic college records bet- 


were obtained on the measures of adjust- 
ment. In all types of adjustment, excepting 
social adjustment, there was an indication of 
superiority in favor of the students of good 


ter than their high school records if they test 
high on the mathematics section of the Sones— 
Harry High School Achievement Test. 

Positive deviation of students of good 
promise (1A) is further indicated by (a) 
relatively good home, health, and emotional 
adjustments, (b) by high scores on the 
Sones—Harry High School Achievement test, 
on the social science section of that test, on 
both the reasoning and computation sections 
of the Progressive Mathematics Tests, and 
on the Barrett-Ryan English Test, and to 
a lesser extent by (c) high scores on the 
Shank Test of Reading Comprehension, on 
the natural science and the language and 
literature sections of the Sones—Harry High 
School Achievement Test, and by a relatively 
poor social adjustment, as measured by the 
Bell Adjustment Inventory. 

It was found that all differences between 
the groups on all measures, with the excep- 
tion of social adjustment, were in favor of 
the students of good promise who exceeded 
expectancy (1A). 

3. No reliably signficant difference was 
shown on any of the measures used between 
the two deviating groups of the students pre- 
dicted to be less successful in college (3A and 
3C). 


promise (1A). 

The percentage of students of poor promise 
(3A) who had made a choice of life-career 
was far larger than the percentage of students 
of good promise (1A). 

5. Completely reliable superiority of the 
good negative deviates (1C) over the poor 
negative deviates (3C) was found on the 
measures of high school achievement, language 
and literature, social science, reading com- 
prehension, and the mechanics of English. 

Nearly reliable differences in favor of the 
students of good promise who did not obtain 
the college averages predicted (1C) were 
shown for the measures of natural science 
information and mathematical ability. 

While no reliable differences were indicated 
between the two groups of less successful 
deviates (1C and 3C) on the adjustment 
measures, suggestions of differences on all 
measures in favor of the less successful 
students of poor promise (3C) were shown. 

A much larger percentage of the students 
of poor promise who had not made the col- 
lege averages expected (3C) had made their 
occupational choice than had the students of 
good promise whose college records did not 
come up to those predicted (1C). This com- 
parison reflects a condition similar to that 
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made a definite vocational choice in contrast 
to 66.34 per cent of the total group of stu- 
dents of poor promise. 

The findings suggest that the fact that a 
student has indicated that he has made a 
definite vocational decision before beginning 
his college program does not account to any 
appreciable degree for work inferior or supe- 
rior to that predicted for him. The chances 
are about even that if a student obtains a 
college record as predicted he will have made 
a vocational choice, and if he is a student of 
poor promise, the chances are appreciably 
greater (about 1o in 100) that his choice 
will have been made than if he were a student 
of good promise. Vocational decision appears, 
therefore, to be more closely, although in- 
versely, related to the quality of scholarship 
than it is to variation of the achieved record 
from the predicted record. 

Investigators report conflicting findings con- 
cerning the relationship between occupational 
decision and the quality of scholastic work. 
Harris found no statistically reliable rela- 
tionship, and in his survey of literature he 
presents the relation of occupational choice 
and school marks as reported by four in- 
vestigators. He states: 


| Vol. 7> No 


According to Lloyd—Jones, superior sty- 
dents tend to have a more definite idea of 
why they came to college and what they 
want to be than the average student. Craw- 
ford also finds higher grades associated 
with definiteness of occupational choice. 
(but not with the knowledge of a definite 
position awaiting one after graduation; or 
with unhampered choice of one’s own occu- 
pation). On the other hand Kefauver and 
Shuttleworth did not find that those with 
definite occupational choices did better 
than those without. Shuttleworth also re- 
ports no relationship between reasons given 
for coming to college and grades.** 


Achilles** in his study of 4,527 under- 
graduates in fifty colleges reported forty-one 
per cent of the “decided” group as above 
average in scholarship, and only seven per 
cent below, while but twenty per cent of the 
“undecided”” group were above average and 
fourteen per cent were below. Marshall* 
reported similar conclusions from his study 
of ninety-one college seniors. He found that 
the “decided” group averaged higher than 
the “undecided” group by four-tenths of a 
grade mark in the freshman year and about 
one-fourth of a grade mark in the first three 
years of college. On the other hand, Wagner** 
discovered no effect of the time of occupa- 
tional decision and college success. 

The findings of this investigation are in 
agreement with Wagner in that vocational 
choice appears to be unrelated to the receiving 
of better or poorer college averages than those 
predicted. It also is in apparent agreement 
with those investigators who found no signi- 
ficant relationship between life-career decision 
and good scholarship. 


CHAPTER V 
CONCLUSIONS 


In this chapter conclusions concerning the 
findings of the investigation will be presented 
under three major headings, namely, specific 
conclusions, general conclusions, and sugges- 
tions concerning further experimentation 


% Harris, op. cit., pp. 10-11. 

* Paul S. Achilles, “‘Vocational Motives in College, Extent 
and Significance of Career Decisions,’’ Occupations, XIII 
(April, 1935), 624-628. 

%3M. V. Marshall, “Life-Career Motive and Its Effect on 

1 Work,” Journal of Educational Research, XXI\\ 
(April, 1936), 596-598. 

% Wagner, of. cit., p. 223. 
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social, and emotional adjustments may serve 
as incentives for students to do better work 
in college than expected, perhaps by way of 
compensation for feelings of inferiority in 
these phases of living. However, exception 
must be made in the case of the superior and 
most successful group (1A) for whom good 
adjustments appear to be either an incentive 
or a concomitant of improved attainment. 

Because of the similarity of the relation- 
ships between all groups on the home, health, 
and emotional adjustment sections of the Bell 
Adjustment Inventory, the use of these ad- 
justment categories may be questioned, for 
the designations may be more in the nature 
of arbitrarily assigned categories than repre- 
sentative of discrete factors in personality 
adjustment. 

14. The attainment of college averages 
either superior or inferior to those predicted 
was shown to have practically no relationship 
to definite choices of like work by students 
before the beginning of their college program. 

There was some indication that a voca- 
tional decision served as a stimulus to stu- 
dents of poor promise (3A) to improve upon 
the expected quality of their work. 

The choice of a life career is more typically 
a characteristic of students of poor promise 
(3A and 3C) than of students of good promise 
(1A and 1C). 


II. GENERAL CONCLUSIONS 


On the basis of the findings of this in- 
vestigation these general conclusions have 
been formulated: 

1. The use of high school averages uncom- 
bined in a regression equation with American 
Council percentile scores will serve most 
practical purposes. The increased labor in- 
volved in the use of combined criteria seems 
disproportionate to the increase in the ac- 
curacy of prediction, especially when the 
probable values inherent in the use of 
predictive measures are considered. 

2. The classification of deviating groups of 
students in accordance with their good or poor 
promise as college students appears to have 
been justified, for reliable differences on 
various measures were shown between such 
groups although none were shown when all 
students who exceeded expectancy (1A and 
3A) were compared with the total group of 
students whose college averages were actually 
below those predicted for them (1C and 3C). 
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3. Measures of academic achievement and 
of non-academic adjustments at the time of 
entrance to college, on the whole, render rela- 
tively little assistance in predicting whether 
students’ scholastic adjustments in college will 
exceed or fall below expectancy. However, 
when they are administered with discretion 
their use is probably justified because they 
help in a small way to provide administra- 
tors, personnel workers, and instructors with 
additional knowledge and insights concerning 
the students with whom they are associated. 


4. While the statistical treatment of group 
data concerning entering college students has 
certain values, as suggested in the conclusions 
above, these values are definitely limited, 
especially in relation to the individual student. 
Great reliance upon statistical findings may 
lead to a failure to view each student as a 
unique personality worthy of individual and 
special consideration. From this point of 
view, specific analysis of all measures for a 
particular student may provide valuable in- 
formation and insights concerning his nature, 
achievements, potentialities, and needs. As 
materials for individual student conferences, 
the measures used appear to be most useful. 
Many of the values inherent in them would 
be lost if they were treated statistically and 
if the findings were used only for general 
administrative purposes. The data of this 
investigation (Chart 19) show that some of 
the most potentially able students, as indi- 
cated by all the measures applied, actually 
make unexpectedly and markedly inferior 
records in college. The data also show that 
some of the students whom the criteria dis- 
tinquish as of markedly inferior promise, and 
whose previous accomplishments and adjust- 
ments are in agreement, actually attain college 
records higher than those even of students of 
superior promise who receive averages higher 
than those expected. It is these observations 
that lead to the emphasis upon the need (a) 
to view the individual student as a distinct 
person, (b) to work with him with all avail- 
able knowledge, and (c) to avoid the smug 
feeling of contentment that comes with 
dependence upon the general conclusions of 
statistical analyses, which, as this study has 
indicated, obscure important data regarding 
the individual. There is obviously no such 
thing as a generalized person, however reliable 
the ratios of the differences between the means 
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indicated for the good and poor groups of 
students whose attainment exceeded that 
expected (1A and 3A). 

6. The students of good promise who 
obtained better college averages than pre- 
dicted (1A) were found reliably to excel the 
less successful students of poor promise (3C) 
on all measures of academic ability. 

Between these two groups no reliable differ- 
ence were indicated for the measure of adjust- 
ment. However, all differences were in favor 


-of the less promising group of students. 


The group of students of poor promise 
(3C) also showed a greater percentage who 
had made their vocational decision. 

7. When the students of good promise who 
did not achieve predicted averages (1C) were 
compared with the students of poor promise 
who exceeded expectancy (3A) it was found 
that the former were superior to the latter 
on all academic measures. These differences 
were completely reliable on all the tests ex- 
cepting those dealing with natural science, 
social science, and mathematical computation. 
On these last three measures the differences 
were nearly reliable. 

No reliable differences on adjustment 
measures were obtained. However, there were 
indications of better home, health, and emo- 
tional adjustment for the students of poor 
promise who obtained college averages better 
than those predicted for them (3A). 

Choices of life-careers had been made by 
an appreciably larger percentage of the stu- 
dents of poor promise (3A and 3C) than of 
the students of good promise (1A and 1C). 
This difference conforms to the findings in all 
comparisons of deviating groups of students 
of poor promise (3A and 3C) with deviating 
groups of students of good promise (1A and 
1C), regardless of whether the variation was 
positive or negative. 

8. Students who have been predicted to do 
good work (1A and 1C) at the time of col- 
lege entrance, demonstrate a better mastery 
of the mechanics of English usage in writing 
than students who have been predicted to do 
less satisfactory work (3A and 3C). Students 
of good promise with relatively high ability 
in written language may be expected to attain 
better scholastic adjustment in college than 
expected, and students of good promise whose 
measures of English usage appear relatively 
low in the distribution may be expected to 
obtain college averages lower than those pre- 
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dicted for them. Therefore, for a student of 
good promise, English usage, as measured by 
the Barrett—Ryan test, appears to be a quality 
that is either related to the improvement oj 
academic attainment, or exists as a concurrent 
factor with it. 

g. Relatively high ability to read compre- 
hendingly is a characteristic which differenti- 
ates students of good promise (1A and 1(C) 
from students of poor promise (3A and (3C) 
at the beginning of their college careers. How- 
ever, it does not indicate reliably a relation- 
ship between the predicted college averages 
and those actually recorded. 

10. Students of good promise (1A and 1C) 
possess superior ability in both the reasoning 
and computation aspects of mathematics. 
However, this investigation does not show 
whether it is the mastery of mathematics, 
or the possession of those qualities which 
facilitate such mastery, that differentiates the 
superior student from the student of lesser 
achievement (3A and 3C). Neither do the 
findings indicate reliably that mathematical 
ability serves to infiuence students to do either 
better or poorer work in college than predicted 
from their previous scholastic records and 
American Council percentile scores. 

11. Superior knowledge of both natural 
science and social science is indicated, but not 
with complete reliability, as both a charac- 
teristic of students of good promise (1A and 
1C), and of students whose standard of per- 
formance in college is better than expected 
(1A and 3A). 

12. Information concerning literary, social, 
and scientific facts, as measured by the tests 
used, appears to be associated with scholastic 
achievement. There are some indications that 
such knowledge contributes to the attainment 
of a better scholastic adjustment than ex- 
pected, one which is not accounted for by 
superior intelligence. 

13. No reliable difference was shown 
between students of good promise (1A and 
1C) and students of poor promise (3A and 
3C), and no significant relationship between 
college records obtained and college records 
predicted were found by the use of any of 
the measures of adjustment. 

The fact that the means on all four tests 
were greater for the total positively deviating 
group (A) than for the total negatively de- 
viating group (C) indicates, although not 
reliably, that relatively poor home, health, 
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TABLE XXIV 


DIFFERENCES BETWEEN THE MEANS* OF PAIRED GROUPS ON THE BELL ADJUSTMENT INVENTORY, 
PART D—EMOTIONAL ADJUSTMENT 


Groups Compared a — 
0 0 

Group 1 Group 2 Group 1 Group 2 
1A 1C 10.04 13.31 
1A 3A 10. 04 10. 55 
1A 3C 10. 04 9.92 
1C 3A 13.31 10. 55 
1C 3C 13.31 9.92 
3A 3C 10. 55 9.92 

A Cc 10. 87 10. 59 


* A low score signifies good adjustment. 


Difference S. E. D *t Chances 

between df. $5. in 
means diff. 100 
3.27 1. 84 1.77 96 
0.51 1. 54 0.33 63 
0.12 1. 42 —0. 08 53 
2.76 1,92 —1.44 93 
3.39 1, 82 —1. 86 96 
9. 63 1. 52 —0. 42 65 
0. 28 0.92 —0. 32 62 


+ The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 
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ReuiABiLiITY OF DIFFERENCES BETWEEN 
tHE MEANS OF PAIRED GROUDS FOR 
BELL ADYUSTMENT /NVENTORY - 
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deviating students of good promise (1C) 
again suggests the possibility that variations 
from predicted achievement which result in 
improved scholastic records may be associated 
with emotional adjustment. However, some 
doubt is cast upon this interpretation when it 
is noted that there is only a slight difference 
(65 chances in 100) between the positive and 
negative groups of students of poor promise 
(3A and 3C). 


The findings of the present study conform 
with those of previous investigations, insofar 
as comparisons can be made within the limita- 
tions of the measures and procedures used by 
the various investigators. Stagner,”’ in his 
review of forty-five investigations, reported 
almost uniformly low, zero, or slightly nega- 
tive correlations between favorable person- 
ality qualities and school achievement. Pint- 
ner,”? in his survey of investigations found 
the same thing. In Stagner’s study, which in- 
volved seven different tests, he reported .15 
as the highest correlation obtained. His con- 
clusions are pertinent. They are: 


1. Linear correlations of intelligence, 
achievement and personality measures are 
low and are probably so as a result of the 
inherent nature of the relationship. 


2. Extreme personality trends seem to 
counterbalance advantages in aptitude, 
making for equal achievement in opposed 
groups. High emotionality and high self- 
sufficiency lead to lower achievement than 
would be predicted from intelligence 
scores.”* 


Harris** reported conflicting findings by 
various investigators. In his investigation he 
found that extroversion and a feeling that one 
is handicapped, as measured by pencil-and- 
paper tests, characterized students who re- 
ceived lower grades than those predicted by 
their scores on the Alpha test.*? 


™ Ross Stagner, “‘The Relation of Personality to Academic 
fate and Achievement,”” Journal of Educational Research, 
XXVI (May, 1933), 648-660. 


Rudolph Pintner, “Intelligence Tests,’’ Psychological 
Bulletin, II, No. 7 (July, 1935), 453-472. 


% Stagner, op. cit., p. 655. 
* Harris, op. cit., pp. 7-8. 
% Ibid., p. 48. 
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Means AND RANGES OF SUBGROUPS ON 
DISTRIBUTION OF OBTAINED COLLEGE 
AVERAGES AND PREDICTED AVERAGES 
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of groups of individuals on any measure to 
their standard errors may be. 

5. The reliable findings, together with less 
reliable indications with regard to the rela- 
tionship of measures to unpredicted scholastic 
records, although limited, may possibly con- 
tribute in some small measure to the formula- 
tion of new or modified administrative and 
personnel procedures, to the development of 
better adapted curricula, to a new synthesis 
of educational principles, to a quickened 
awareness of the unique quality of student 
personality, and to a stronger determination 
to increase the favorable influences and de- 
crease the unfavorable influences which affect 
growing and developing personality, in so far 
as the power lies within the province and 
limitations of the administrator, personnel 
worker, and teacher to accomplish these ends. 


III. SuccEestions For FuRTHER 
INVESTIGATION 


1. One of the most notable observations 
concerning the characteristics of the different 
groups is the apparent paradox shown by the 
superiority on many of the measures of the 
students of poor promise who accomplished 
less than expected (3C) over other students 
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of poor promise who obtained better records 
than expected (3A). This superiority was 
either reliably indicated or found with chances 
greater than eighty in one hundred for meas. 
ures of predicted college average, mechanics 
of written English, reading comprehension, 
general mathematical information and mathe- 
matical reasoning, home adjustment, and 
health adjustment. Need for more informa- 
tion concerning this group (3C) is further 
suggested by the fact that the average intel- 
ligence of its members as measured by the 
American Council test, is reliably superior to 
that of the group of students of poor promise 
who did work superior to that predicted for 
them (3A). Experimentation in this connec- 
tion should probably center upon the types 
and quality of motivation, upon the nature 
of the environmental influences playing upon 
students in college, outside of college, and 
before college entrance, and upon the nature 
and type of the personal limitations and 
adaptations of individual students rated as 
having poor promise. 


2. Knowledge of the characteristics of the 
students’ native endowments and of their 
adjustments and achievements made before 
entrance to college is valuable and necessary. 
The same is true concerning the bearing that 
these may have upon the quality of their 
academic adjustment in college. While this 
investigation has been carried on in the hope 
of increasing information of this sort, at no 
time has the assumption been held that aca- 
demic adjustment as defined herein is entirely 
desirable. Further investigations are needed 
to find answers to such questions as: Just 
what is it that students adjust to when they 
are awarded the symbols of such adjustment 
in professors’ marks, or perchance in an A.B. 
degree, cum laude? Are the values represented 
by the symbols the most important values in 
terms of the potential and essential contribu- 
tions of higher education in a democracy? In 
terms of individual and social well-being are 
not materials other than those traditionally 
offered more valuable? In fact, may not 
adjustment to what exists in college be 
representative of a handicap more than a help 
from the viewpoint of long-time individual and 
social values? Are there not artificialities, 
vested interest, idiosyncracies of institutional 
traditions and faculty personalities, and other 
negative factors which enter into the admin- 
istration of symbols of academic adjustment 
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that have no legitimate place in an educa- 
tional institution? What are the valid pur- 
poses of higher education in a democracy, 
and how can student progress in achieving 
these objectives be measured? 


3. Further experimentation is needed con- 
cerning services, procedures, philosophies, and 
materials which may be introduced into the 
college environment which will take students, 
with the endowments, accomplishments, and 
potentialities with which they enter college, 
and cause them actually to rise superior to 
all of the prediction formulae thus far 
developed. For example, what types of 
remedial work in the fields of knowledge or 
skill can be provided with profit? What 
methods shall be followed, and what is their 
relative effectiveness? What adaptations of 
curricula to student need can be made, and 
what are student needs? What guidance serv- 
ices will render the most valuable assistance 
to students, and how can curriculum and 
guidance be made more nearly functions of 
the same thing? How can a sound set of 
objectives for a college be formulated, and 
how can the objectives be made a part of the 
nervous systems of administrators, personnel 
workers, and teachers? Further information 
on all these questions is needed if the 
learning environments and the quality of stu- 
dent adjustment in college are to be 
improved. 
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THE VALUE OF CERTAIN FACTORS FOR DIRECT AND 
DIFFERENTIAL PREDICTION OF ACADEMIC SUCCESS* 


CLAUDE L. NEMZEK 
University of Detroit 


Intelligence tests are being used extensively 
for purposes of prediction; however, studies 
carried on by numerous investigators under a 
wide variety of conditions have adequately 
demonstrated that the functions measured by 
the types of intelligence tests now available 
show only a moderate degree of relationship 
to academic achievement. Reviews of these 
studies indicate that, among typical groups 
of high school students, the Pearson product- 
moment coefficient of correlation between in- 
telligence test results and academic achieve- 
ment is approximately .50 (Eurich and Car- 
roll,t Lee,? Pintner,* Segel,* Turney’). This 
degree of relationship suggests that non- 
intellectual factors play a large part in con- 
ditioning scholastic success. In order to be 
able to predict academic success more ade- 
quately than is possible by using intelligence 
tests alone, one must segregate those non- 
intellectual factors which, independent of the 
functions measured by our available intelli- 
gence tests, bear a significant relationship to 
measures of academic performance. 

The present study represents an attempt to 
determine the value of a number of non- 
intellectual factors for predicting school suc- 
cess as measured by teachers’ marks. 

The purpose of this study is to present an 
analysis of data in order to reveal any pos- 
sible values that chronological age at entrance 
to elementary school, amount of education of 
father, amount of education of mother, and 
occupational status of father may have for 
direct and differential prediction of academic 
success as measured by teachers’ marks. 


* From a thesis submitted to the Graduate Faculty of the 
University of Minnesota in partial fulfillment of the require- 
ments for the degree of Doctor of Philosophy. 

?Alvin C. Eurich and Herbert A. Carroll, Educational 
Psychology. Boston: D. C. Heath and Co., 1935. Pp. 436. 

2J. Murray Lee, A Guide to Measurement in Secondary 
—s. New York: D. Appleton Century Co., Inc., 1936. 

Pp. \ 

* Rudolph Pintner, Intelligence Testing. New York: Henry 
Holt and Co., 1931. Pp. 555. 

*David Segel, Prediction of Success in College. Office of 
Education Bulletin 1934, No. 15. Washington: United States 

vernment Printing Office, 1934. Pp. 98. 

5 Austin H. Turney, Factors Other Than Intelligence That 
Affect Success in High School. Minneapolis: University of 
Minnesota Press, 1930. Pp. 135. 





Data were available in the records at Uni- 
versity High School, University of Minne- 
sota, for 196 boys and 156 girls. All of these 
cases had been graduated from University 
High School. The following ten variables 
were tabulated for each of the 352 cases: 


(1) Intelligence quotient 

(2) Chronological age in months at en- 
trance to elementary school 

(3) Amount of education of father in 
years 


(4) Amount of education of mother in 
years 

(5) Occupational status of father on the 
Minnesota Scale 


(6) Honor point average in mathematics 
(7) Honor point average in science 
(8) Honor point average in English 


(9) Honor point average in history and 
social science, and 


(10) Honor point average in languages. 

Only those cases were used wherein the 
results of at least two years of course-work in 
each of the five subject-matter fields were 
available. 


The measure of intelligence used in this 
study was based upon the results of five 
group intelligence tests. The tests employed 
were Army Alpha 8, Pressey Senior Classifi- 
cation, Haggerty Delta 2, Terman Group 
Test, Form A, and Miller’s Mental Ability 
Test, Form A. Intelligence quotients were 
computed from the results of each test for 
each individual. The authors’ manuals for 
the respective tests were followed as closely 
as possible in administering and scoring, and, 
for ali cases except that of the Pressey Test, 
in computing the intelligence quotients. In 
this instance, where the author’s norms proved 
inadequate for children who made unusually 
high scores, the difficulty was resolved by 
extrapolation. The intelligence quotients 
were in all instances converted into Stanford— 
Binet equivalents by means of the method 
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proposed by Miller.* Of the five intelligence 
quotients, the middle value was chosen as the 
measure to be used for each individual. 

Marks at University High School are given 
in the form of letter ratings. For the present 
study the letter ratings were converted into 
honor point averages. Each quarter hour 
mark of A was given three honor points; each 
quarter hour mark of B, two honor points; 
each quarter hour mark of C, one honor 
point; each quarter hour mark of D, no 
- honor points; and each quarter hour mark of 
F, minus one honor point. Then the total 
number of honor points in each of the five 
subject-matter fields was divided by the total 
number of quarter hours of marks involved in 
the respective subject-matter fields in order 
to obtain the five honor point averages. 


In Table I are presented the Pearson 
product-moment coefficients of correlation 
obtained by computing separately for boys 
and girls all of the intercorrelations for the 
ten variables which were available, together 
with the means and standard deviations of 
the ten variables. 

Of the forty coefficients of correlation 
showing the extent to which chronological age 
at entrance to elementary school, amount of 
education of father, amount of education of 
mother, and occupational status of father are 
related to honor point averages in mathe- 
matics, science, English, history and social 
science, and languages, not one is statistically 
significant, in the sense of exceeding four 
times its probable error. We may therefore 
conclude that the value of chronological age 
at entrance to elementary school, amount of 
education of father, amount of education of 
mother, and occupational status of father, for 
the direct prediction of academic success as 
measured by honor point averages derived 
from teachers’ marks, is negligible. 

Despite the fact that certain variables may 
be of little value for purposes of direct pre- 
diction, they may be more valuable as prog- 
nostic of differential ability. This is primarily 
due to the fact that in finding a differential 
correlation coefficient a negative and a posi- 
tive direct coefficient may be brought together 
and the effect is an additive one. A survey of 
Table I reveals that of the 40 coefficients of 
correlation showing the relation of chrono- 
logical age at entrance to elementary school, 


*W. S. Miller, “The Variation and Significance of Intelli- 
gence Quotients Obtained from Group Tests,’ Journal of 
Educational Psychology, 15: 359-366, 1924. 
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amount of education of father, amount of 
education of mother, and occupational status 
of father to the five honor point averages, 
twenty-two are positive and eighteen are neg- 
ative; furthermore, there are only six oppor- 
tunities for negative and positive direct co- 
efficients to have an additive effect to produce 
higher differential coefficients. 

Despite the fact that the direct coefficients 
are so low that they almost preclude any sig- 
nificant differential coefficients, the value of 
chronological age at entrance to elementary 
school, amount of education of father, amount 
of education of mother, and occupational 
status of father for purposes of differential 
prediction was determined by Segel’s’ method. 
In Table II are included the differential pre- 
diction coefficients* based upon the data avail- 
able for the 196 boys and the 156 girls under 
consideration. 

A study of Table II reveals that chrono- 
logical age at entrance to elementary school, 
amount of education of father, amount of 
education of mother, and occupational status 
of father have no significant value, practic- 
ally or statistically, for purposes of differen- 
tial prediction of the abilities measured by 
honor point averages in mathematics, science, 
English, history and social science, and lan- 
guages. Not one of the 80 differential predic- 
tion coefficients is as much as four times its 
probable error. 

Undoubtedly the mental functions meas- 
ured by honor point averages in mathematics, 
science, English, history and social science, 
and languages have a high degree of commu- 
nity of function. That this is probably true 
is indicated by the data in Table I. In the 
case of the boys, the intercorrelations obtained 


™David Segel, Differential Diagnosis of Ability in School 
Children. Baltimore: Warwick and York, Inc., ee 86 

David Segel, Prediction of Success in College. ce 
Education Bulletin 1934, No. 15. Washington, D. C.: Govern- 
ment Printing Office, 1934. Pp. 98. : 

David Segel, “The Construction and Interpretation of Dif- 
ferential Ability Patterns,” Journal of Experimental Educa- 
tion, 2: 283-287, 1934. 

David Segel, “Differential Prediction of Ability _as Repre- 
sented by College Subject Groups,” Journal of Educational 
— 25: 14-26, January, 1932; 25: 93-98, February, 


David Segel, ‘Differential Prediction of Scholastic Success,” 
School and Society, 39: 91-96, January 20, 1934. 

David Segel and J. R. Gerberich, “Differential College 
Achievement Predicted by the American Council Psychological 
Examination,” Journal of Applied Psychology, 17: 637-645, 
December, 1933. 

J. Murray Lee and David Segel, “‘The Utilization of Data 
from Simple or Prediction in the Development of 
Regression Equations for Differential Prediction,’ Journal of 
Educational Psychology, 24: 550-554, October, 1933. 
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from .710 to .825. 

The data demonstrate that the intelligence 
quotient has considerable value for direct 
prediction. In Table I, one may note that the 
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from these five honor point averages range 
from .611 to .766; in the case of the girls, 


the five honor point averages range from .401 
to .502 for the boys; from .495 to .606 for 
the girls. These data corroborate the findings 
of other investigators. 

The ineffectiveness of the intelligence quo- 
tient for differential prediction is clearly por- 





relationships of the intelligence quotient to trayed in Table II. Differential coefficients in- 
' TABLE I 
f MEANS, STANDARD DEVIATIONS, AND INTERCORRELATIONS WITH PROBABLE ERRORS, OF TEN 
‘ VARIABLES,* FOR 196 Boys AND 156 GIRLS** 
Variables Boys 3 4 5 6 7 s 9 10 
Girls Means 117.00 71.16 13.34 11.85 1.91 1.010 1.255 .980 1.145 .895 
S D's 11.95 3.73 2.66 .94 . 705 .745 .605 .620 . 785 
-205 -179 —.170 .456 .499 .502 .437 401 
1 117.40 12.60 .046 -047 .047 .038 . 036 .036 .039 .040 
—.194 . 093 .028 —.072 —.063 —.040 —.052 -—-.078 —.154 
2 71.22 7.53 .048 .048 .048 .048 .048 .048 .048 .047 
° .522 —.665 .134 .170 .150 . 082 .078 
- 3 .87 3.68 . 035 .027 .047 .147 047 .048 .048 
.451 —.329 114 070 .163 097 099 
r 4 .13 2.64 .043 043 048 .048 .047 048 048 
_ .692 —.309 —.128 —.181 —.143 —.094 139 
5 91 1.11 .028 049 047 047 .047 048 047 
= .147 162 —.107 754 .642 657 662 
6 .045 745 .053 053 053 021 .028 027 027 
5 .131 112 —.115 800 .621 879 611 
f 7 .100 . 760 053 053 053 019 030 026 030 
.075 101 —.116 753 764 766 691 
S 8 .335 - 705 .054 .053 053 023 022 020 .025 
4 .057 .055 —.128 .741 778 825 . 683 
- 9 .260 -740 .054 .054 .053 .024 021 O17 026 
.157 136 —.206 .710 . 758 .806 .787 
eo 10 .230 .865 .053 .053 .052 .027 .023 .019 .021 
y * Variables: (1) IQ, (2) CA in months at entrance to elementary school, (3) education of father in years, (4) education of mother 
. in years, (5) occupational status of father, (6) HPA in mathematics, (7) HPA in science, (8) HPA in English, (9) HPA in history 
4 and social science, (10) HPA in languages. 
\- ** Means and S D’s of boys at top; means and S D’s of - at left; intercorrelations for boys above and to right of major 
' diagonal; intercorrelations for girls below and to left of major diagonal. The upper figure of each pair is the Pearson product-mo- 
id ment correlation coefficient; the lower its probable error. 
Ss 
7 TABLE IT 
3, DIFFERENTIAL PREDICTION COEFFICIENTS* FOR THE TEN VARIABLES** 
% Variables 1 3 4 5 
6— 7 —.103 —. 030 —. 062 . 056 . 088 
e —.119 031 . 020 . 024 .014 
e 6— 8 . 035 —.024 .010 —. 035 —.010 
d .014 .010 -111 . 097 . 004 
al 6— 9 .091 .010 . 079 . 036 —.010 
6 . 056 . 033 .126 . 146 . 028 
of 6—10 .010 .012 . 054 . 00002 . 030 
B- —. 040 . 055 —. 042 . 005 . 158 
'. ™—8 .113 . 0001 . 060 —.077 —. 081 
a- .124 —. 082 . 092 . 028 —.010 
7—9 . 180 . 033 . 136 —.017 —. 137 
al .172 —. 062 . 155 . 089 .010 
y, 7—10 . 084 .013 . 097 —. 037 —. 039 
a . 057 —. 022 —. 063 —. 057 . 158 
8— 9 . 078 . 040 . 095 . 092 —.068 
se . 055 .010 . 026 .071 . 010 
al 8—10 —.020 . 016 . 052 . 036 . 039 
5, —.059 . 055 —.162 —.091 . 188 
= 9—10 —.075 .013 —.014 —.030 .174 
of —.100 . 032 —.174 —. 143 . 156 
0 





* The upper figure of each pair is for the 196 boys; the lower for the 156 girls. 


** Variables: (1) IQ, (2) 
years, (4) education of mother in 


A in months at entrance to elementary school, (3) education of father in 
: ars, (5) occupational status of father, (6) HPA in mathematics, (7) 
HPA in science, (8) HPA in English, (9) HPA in history and social science, and (10) HPA in languages. 
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volving the intelligence quotient range from 
—.103 to .180 for the boys; from —.119 to 
.172 for the girls. 


SUMMARY 


The purpose of this study was to determine 
the value of the intelligence quotient, chrono- 
logical age at entrance to elementary school, 
amount of education of father, amount of 
education of mother, and occupational status 
of father for purposes of direct and differen- 
tial prediction of academic success as meas- 
ured by honor point averages based upon 
teachers’ marks. Data were available for 196 
boys and 156 girls, all of whom had been 


[Vol. 7> No. 3 
graduated from University High School, Uni- 
versity of Minnesota, and all of whom had 
had at least two years of course-work in each 
of the subject-matter fields for which honor 
point averages were computed. 

The data demonstrate that chronological 
age at entrance to elementary school, amount 
of education of father, amount of education 
of mother, and occupational status of father 
have negligible value for purposes of direct 
and differential prediction of academic success 
as measured by honor point averages derived 
from teachers’ marks; and that the intelli- 
gence quotient has value for direct prediction 
but not for differential prediction. 








All 
that 7 
succes 
—and 
presse 
to dev 
lege s' 
row sé 
gence 
lating 
other 
ognize 
aim te 
of aca 
social 
broad 
tive; 
get al 
velop 
ther, 
inav 
sumal 
thoug 
tain t 
and 

Th 
a sys 
lems, 
objec 
ment 
It is 
tive j 
tively 
latior 
other 
indic 
venti 
lege 
lieve 
all tk 
to an 
not ¢ 

* Fr 
quirem 
versity 


for the 
directic 








ni- 
ad 
ich 


ior 
cal 


ion 
her 
ect 
ess 


lli- 


ion 





A TECHNIQUEFFOR THE MEASUREMENT OF 
SOCIAL ADJUSTMENT* 


J. E. JANNEY 
Western College, Oxford, Ohio 


PURPOSE 


All too often it is assumed in faculty circles 
that the important criterion of a student’s 
success in college is his academic achievement, 
—and of course academic achievement is ex- 
pressed primarily in terms of grades. Attempts 
to develop instruments for prognosis of col- 
lege success use academic record in this nar- 
row sense as the major criterion; thus intelli- 
gence tests are commonly validated by corre- 
lating these tests with point-hour ratio or 
other similar data. However, it is being rec- 
ognized increasingly that the college should 
aim to accomplish more than the development 
of academic competence. The development of 
social competence (using this phrase in a 
broad sense) should also be a major objec- 
tive; students should learn at college how to 
get along with their fellows and should de- 
velop certain capacities for leadership. Fur- 
ther, adjustment to the other sex is (at least 
in a woman’s college) a problem to which pre- 
sumably the college should give some 
thought, since such adjustment is almost cer- 
tain to be a major factor in the future life 
and happiness of these young women. 


The purpose of this research was to make 
a systematic study of college guidance prob- 
lems, taking special account of the last two 
objectives—the development of social adjust- 
ment to the same sex and to the other sex. 
It is believed that the investigation is distinc- 
tive in two respects. In the first place, rela- 
tively objective indices of social ability in re- 
lation to the same sex and in relation to the 
other sex were developed. To develop these 
indices, and to use them along with the con- 
ventional index of grades in a study of col- 
lege prognosis and guidance problems, is be- 
lieved in itself important. In the second p’ace, 
all three criteria of success have been related 
to an unusual variety of other data, including 
not only intelligence test scores, but also re- 


* From a thesis nted in partial fulfillment of the re- 
quirements for the Ph. D. in Psychology at Ohio State Uni- 
versity, 1935. The writer wishes to his appreciation 
for the advice and counsel of Dr. S. L. Pressey, under whose 
direction this investigation was 


sults of tests of interests and attitudes and 
ratings on a number of traits by both stu- 
dents and faculty. The total results would 
thus seem to be of exceptional range, and to 
offer rich opportunities for comparisons of 
possible significance. 


PROCEDURE 


The study was made in a small liberal arts 
college for women located near a _ co- 
educational state university in a typical col- 
lege town in the Ohio valley. The data used 
are from the upper three classes of the col- 
lege. The institution gave exceptional oppor- 
tunities for such a study. Since the college 
student body was small, it was possible to 
find leading students and faculty members 
who knew everyone in the student group. 
Every one of this student group was known 
by the writer. All the students live in dormi- 
tories. The total situation in the college and 
in the community is well known to the writer. 
Opportunities for studying the total life of 
these young women and their total develop- 
ment were thus exceptional. 

The three criteria for the three types of 
college success above mentioned were as fol- 
lows: (a) academic success was indicated by 
point-hour ratio, or the number of credit 
points divided by the number of semester 
hours; (b) success in relation to other women 
students was indicated by an index of success 
made out from each student’s campus record. 
Thus each office and honor on the campus 
was given a rating, such as class president 40, 
athletic letter 25, etc., and a great variety of 
such items considered, including memberships 
on committees and other like minor recogni- 
tions, so that almost all students had a record 
of some sort. (c) Success in relations with the 
other sex was indicated by number of evening 
dates for the nine month school year as ob- 
tained from the dormitory “sign-out” book in 
which each young woman is required to record 
the name of each male caller. Careful inquiry 
indicated that these records were reasonably 
accurate and might be considered a real cri- 
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terion of the extent to which a girl had made 
mutually satisfactory adjustments to the 
other sex. 

The “dependent variables” or further 
group of measurements above mentioned were 
as follows: (a) intelligence test scores 
(O.S.U.), (b) the Pressey Interest-Attitude 
Test, (c) the Thurstone Attitude Scale on 
Communism, and, (d) ratings on the follow- 
ing personality traits, (i) cooperativeness, 
(ii) sagacity, (iii) home background, (iv) 
emotional maturity, and (v) sophistication. 
The mid-rating of each of two groups of 
raters was selected, the two groups being 
(1) three members of the college faculty and 
(2) three officials of the student government 
association. The writer would emphasize the 
variety of the data gathered and emphasize 
especially the variety of relatively objective 
criterion variables. 


RESULTS 


The inter-relations of the criterion vari- 
ables with dependent variables were studied 
by means of the Toops correlation formula 
for the Hollerith machines. 

In analyzing the inter-correlations of vari- 
ables, No. 1, ie. number of dates with the 
other variables, the following observations are 
suggested. Since 17 of the 19 inter-correlations 
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are less than .20, and for all practical pur- 
poses approach o, it would appear that the 
number of dates had by a girl over a nine 
month interval may be taken as an independ- 
ent measure of sociality. The two correlations 
which are more than .30 are on the rating of 
sophistication, .39 for faculty rating and .35 
for the student rating. It is possible that this 
may be tautological, i.e., girls who have dates 
are considered sophisticated — sophisticated 
girls are those who have dates. 

The inter-correlations of campus activities 
(variable No. 2) with the other variables 
shows a positive correlation of .49 with point- 
hour ratio, .31 with intelligence, and would 
seem to indicate that there is a tendency for 
those qualities or abilities which make for 
academic success to be similar to those qual- 
ities or abilities which make for social success 
with members of one’s own sex, as measured 
by extra-curricular achievement. Interest- 
Attitude Test No. 4 (admirations) shows a 
negative correlation of .20. Since the scores 
on this test are immaturity scores, it is sug- 
gested that there is somewhat of a tendency 
for maturity of interest to be positively cor- 
related with extra-curricular participation. It 
is possible that relatively high correlations be- 
tween campus activities and personality rat- 
ings present another tautology, i.e., the raters 


TABLE I 
SHOWING CORRELATIONS OF CRITERION VARIABLES WITH ALL OTHER VARIABLES 


Number of dates___-_ 
Campus activities 
Point-hour ratio 
Intelligence 
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10. Thurstone Att. Scale toward Communism 


FACULTY RATINGS 
11. Co-operation 

12. Sagacity 
13. Home background__-_-_-__- 
14. Emotional maturity 
15. Sophistication........._.___- 


STUDENT RATINGS 
16 Co-operation 
17. Sagacity 
Re. geemee peceeroume..... 2... eee 
19. Emotional maturity 
20. Sophistication 


Number of Campus Point-hour 
Dates Activities Ratio 
iad —. 04 —.03 
hea —.04 . 49 
xia —.03 . 49 
ane . 03 . 82 . 56 
ian . 03 —.02 —.03 
Jul . 04 . 07 . 04 
ana —.17 . 05 —. 07 
ate —.03 —.20 —.15 
Si —.09 —.11 —.13 
data . 03 22 . 28 
nts —.12 .51 57 
aie 02 . 62 81 
yo 08 .42 43 
sa —. 03 . 45 57 
: 39 —.14 —.11 
Le —. 08 . 66 52 
esis 01 . 57 52 
es 06 . 54 56 
Lo —. 02 45 41 
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were unconsciously influenced by a halo effect 
and therefore rated favorably those students 
who were noted in campus affairs. The Thur- 
stone Attitude Scale on Communism is taken 
as a measure of open-mindedness on a highly 
controversial issue. A high score on this atti- 
tude scale means that a given student is will- 
ing to view the question on Communism ob- 
jectively and impersonally. The positive cor- 
relation of .22 indicates that insofar as this 
“burning” issue is concerned those students 
who participate extensively in extra-curricular 
activities are relatively more open-minded 
than those who do not participate. 


Inter-correlation of point-hour ratio (vari- 
able No. 3) with the variables of number of 
dates, campus activities, and Thurstone Atti- 
tude Scale on Communism, have already been 
discussed. The usual positive correlation of 
.56 between point-hour ratio and intelligence 
was found. The usual faculty attitude toward 
point-hour ratios as the chief end and aim of 
life is indicated by the positive correlation of 
81 between faculty rating on sagacity and 
point-hour ratio. 

Since the correlations from the Pressey 
Interest-Attitude Test, both sub-scores and 
total scores, were uniformly low, both nega- 
tively and positively, an item-analysis was 
made. 

The 160 subjects were separated into three 
groups on the basis of three criterion vari- 
ables, 50 high on each variable and 50 low 
on each variable, leaving an undistributed 
middle of 60. Those items which yielded a 
difference of 20 per cent or more between the 
upper and lower groups were selected for 
study. The 20 per cent difference produced a 
critical ratio of 2.8, but since there were a 
large number of differences all in the same 
direction, it was thought that this critical 
ratio was large enough to be reliable. 

The results were about what one would ex- 
pect, superior students were much more inter- 
ested in “science,” “reading,” “studying,” 
“education,” etc. The students who had a 
large number of dates were much more inter- 
ested in “clothes,” “fashion,” “personal ap- 
pearance,” “men,” “children,” and worried 
much more about examinations. 

Those students who were active in extra- 
curricular activities showed a preference for 
such items as “baseball,” etc., which were 
concerned with athletics and other campus 
affairs. 


MEASUREMENT OF SOCIAL ADJUSTMENT 





CONCLUSION 


We may conclude from this study that 
(1) extra-curricular activities and scholarship 
are positively related, (2) insofar as young 
women are concerned, dates are an independ- 
ent and unique measure of sociality, (3) open- 
mindedness as measured by the Thurstone 
Attitude Scale on Communism seems to be 
slightly correlated with scholarship, intelli- 
gence, and extra-curricular activities, (4) the 
three measures of achievement used in this 
study, point-hour ratio, extra-curricular ac- 
tivities, and dates, seem to carry with them 
their own individual constellation of interests 
and attitudes. The Item-analysis of the 
Pressey Interest-Attitude Test shows that 
students with a high point-hour ratio show 
a preponderance of intellectual interests with 
little worry over examinations. Those who 
have dates show a preponderance of those 
interests which pertained to personal appear- 
ance of interests in athletics and related 
curricular participation shows a preponder- 
ance of interests in athletics and related 
activities. A Factor Analysis (Thurstone 
Method) of the inter-correlations of both 
faculty and student ratings on the five per- 
sonality traits showed a double halo effect, 
i. e. there was a marked tendency for both 
groups of judges to rate high those students 
whose activities were largely on the home 
campus on all traits except sophistication and 
conversely to rate low on all traits except 
sophistication those students whose claim to 
distinction lay chiefly in their social success 
with the male sex. 


EDUCATIONAL IMPLICATION 


It is hoped that this study will aid in 
dispelling a part of the fog of unsupported 
opinion in regard to the supposed antipathy 
of intelligence and scholarship on the one 
hand and extra-curricular activities and dates 
on the other hand. It appears that young 
women may participate in a wide variety of 
extra-curricular activities and at the same 
time achieve academic distinction. Since the 
correlations between dates on the one hand 
and intelligence and point-hour ratio on the 
other approach O, the writer suggests that 
the studious, intelligent, young woman may 
well offer effective competition to her more 
social and less academic sisters in the field 
of heterosexual social endeavor. 











THE McCAULEY TETRAHEDRON TEST 


Lestie D. Hayes AND CHARLES A. DRAKE 
West Virginia University 


This discussion of the McCauley Test is 
presented primarily because of the bearing 
of an analysis of the test upon the design 
of other tests. From this analysis certain 
principles are deducible which should be of 
value to all who undertake the construction 


‘of similar tests. The results indicate that it 


does not follow that a test should be sum- 
marily discarded because it does not meet 
the usual criteria of reliability and validity. 
Often the attempt to measure the test against 
the criteria will reveal difficulties in the way 
of obtaining higher values and indicate the 
modifications that should be made in design, 
administration, and interpretation. 


The test, as originally designed by 
Mr. W. J. McCauley’ of the University of 
Arizona, consists of six right tetrahedrons. 
These were cut from a block of wood 
1’x14"x2¥4” by sawing through the diag- 
onally opposite sets of edges, making three 
separate cuts with the saw. Each student was 
given a set of the six blocks numbered from 
one to six and a set of 144 drawings showing 
every possible position these blocks could oc- 
cupy relative to the planes of projection, each 
drawing showing three views of a block in 
orthographic projection. The task of the 
examinee was to choose the block represented 
in each drawing, covering as many drawings 
as possible in a given length of time. 


The hypotheses upon which the tests were 
based were that one of the fundamental traits 
of the engineer is ability to visualize in two 
or three dimensions, to perceive relationships 
in both, and to pass from one to the other. 
The motive was to provide a means for the 
objective measurement of this trait or set 
of traits. The hypothesis received support 
from a study made by the Engineering Foun- 
dation in 1930. Following this report several 
committees of the Society for the Promotion 
of Engineering Education undertook studies 
bearing upon the measurement of the traits 
in question. Since 1936 the latter effort has 
centered in a special committee of the Engin- 

hang W. J., The Tetrahedron Test of Power to 


Visualize, The Journal of Engineering Education, 23:8:624— 
627, April, 1933. 


eering Drawing Division of this society, be- 
cause of the belief that descriptive geometry 
is the one subject that both develops and 
utilizes these traits. Prof. C. V. Mann, of the 
Missouri School of Mines, has been chair- 
man of this committee coordinating experi- 
mental work in testing the hypotheses. 

The untimely death of Mr. McCauley in 
1935 might have ended further development 
of his test had further experimentation not 
been taken up by Professor Mann’s commit- 
tee. Ten institutions have given the test to 
more than 2,000 students, but few of the re- 
suls have thus far been published. McCauley’s 
report in 1933 is the last and only compre- 
hensive summary of findings. 

Mann? and McCauley found r’s of .70 be- 
tween scores on the test taken during the 
second half hour of testing and grades in 
descriptive geometry awarded on the basis of 
objective tests. They found scores on the 
second half hour gave somewhat better cor- 
relations than scores on the first half hour 
and on the total hour of testing. 

Our results are based on a period, of only 
one-half hour, for 95 students. The correla- 
tion between scores for two administrations 
of the test, in February and again in May, 
is .41, a result that is disappointingly low 
and that would ordinarily justify rejection 
of the test. While McCauley reported an odd- 
even coefficient of reliability of .94, this re- 
sult must be interpreted with care. In general, 
time-limit tests having items of equal diffi- 
culty show misleadingly high odd-even cor- 
relations because of the application and 
measurement of approximately equal speeds 
per item on odds and evens, thus making the 
correlations reflect speed in performance. 
Hence, for speed tests, the only legitimate 
measurement of reliability is that derived 
from two forms of the test applied at different 
times. Mann has found the reliability suffi- 
ciently high to justify continuation of the 
experimentation, although he does not give 
a correlation coefficient. 

Our correlations between the first applica- 
tion of the test and grades in descriptive 

2 Letter from Professor C. V. Mann dated Jan. 17, 1939. 
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geometry and between the second application 
of the test and the same grades are .25 and 
24, respectively. These results are too low to 
justify the use of the test for sectioning a 
class prior to instruction. They also indicate 
that the trait measured by the test does not 
seem to be affected by the instruction in 
descriptive geometry that was given during 
the interval between tests. Distributions of 
the scores on the two tests are shown in 
Table I. 


TABLE I 


DISTRIBUTIONS OF SCORES 


Scores February May 
I I aati enies 1 
 , 2 See 1 
ae 1 
4 2. See 3 
2 aaa 1 2 
A yy 
a AES 1 7 
i. — eas 6 4 
ee en 5 8 
eer eae 16 16 

c- eee 32 27 
—9 to — eee 25 22 

see UI III a. icin ecemensbtarbintndntioie & 3 
a eae 1 
ae ee ee 95 5 


Inspection of this table reveals a dispro- 
portionately large group of scores of zero and 
below. Part of this is due to the method of 
scoring the test. This method assumes that 
by guessing alone, an examinee should be 
able to get right answers for one sixth of the 
number of items he tries, since there are only 
six blocks among which to choose. To offset 
the effect of guessing, the final scores were 
computed from the formula: Five times the 
number right minus the number wrong. Of 
course, only the average of the scores made 
by guessing alone will fall at zero while there 
will be a normal distribution of scores around 
this point. This accounts for the range of 
scores below zero but also carries the impli- 
cation that a similar range of scores above 
zero is attributable to the same cause. 


The foregoing consideration suggests that 
higher correlations between the two tests and 
between the tests and grades might be ob- 
tained if this group of scores so strongly 
affected by guessing was removed from the 
calculations. When the 41 cases that seemed 
least affected by guessing were handled as a 
separate group they gave correlations of .37 
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between the two sets of scores on the tests, 
.24 between the first test and grades, and 
.20 between the second test and grades—tre- 
sults but little different from those found for 
the whole group previously. 

It is always possible to attribute a low 
correlation between a test and grades in a 
course to the well-known fact that the latter 
are themselves usually unreliable. If the cor- 
relations between scores on successive exam- 
inations in a course, as well as between grades 
in the same course for two terms or semesters, 
are usually not better than .75—as is the 
case—then correlations between such scores 
or grades and scores on a test that seems to 
be related to the abilities called for in the 
course will seldom be much higher than that 
figure. Where the grades are even less reliable 
—as is, again, often the case—the correlations 
between grades and test scores will tend to 
be still lower, also. 

While this may explain the low correlations 
found between the grades and the test scores, 
it does not explain the low correlations be- 
tween the scores on the two applications of 
the test. Some other factor must be re- 
sponsible for this latter result. It can not be 
due to intelligence as measured by the A.C.E. 
test, since those who gained on the second 
application of the test (61% of the whole 
group) had an average percentile score of 
43 on the national norms, whereas the 39% 
who had losses or no change had an average 
percentile score of 45. Variabilities in the 


A.C.E. scores in the two groups were similar. 
(See Table IT). 





TABLE II 
DISTRIBUTION OF GAINS AND LOSSES ON RETEST 


Aver. Percentile 





Amount in Score on the 
Score Points Frequency A.C.E. Test 
i = = 7 
oe: Wiens 4 
no a... 
41 to 50__--- 1 ‘ a 
( z= 7 : 
ff. = 8 
if a 14 
tt ie aan 22 J 
—9 to es 20 
—19 to —10_____ 14 45 
—29 to —20____- 2 5. 
—39 to —30_____ 1 


A study of the relative difficulty of the 
several blocks yields no basis for an explana- 
tion. The blocks numbered from one to six 
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were identified with the appropriate drawings 
in 45, 42, 44, 48, 41, and 45% of the times 
they were tried, respectively, and each ap- 
peared 24 times among the 144 drawings 
arranged in random order in the testing pro- 
cedure. There was no indication of any 
significant differences in difficulty among 
either the blocks or the drawings. The random 
order was itself free from any discernable 
bias. 

There is another possible explanation for 
the low correlations which follows from the 
observation that 39% or more of the group 
were not able to improve their scores on the 
second trial. A test of this kind can usually 
be performed by insight, by systematic trial 
and error, and by random trial and error. 
It is not unusual to observe an examinee 
starting a test with random trials, then 
moving on to systematic trials, and finally 
arriving at such insight that selection of the 
pieces is made from perceptual cues without 
trial manipulations of the pieces themselves. 
Where this order of procedure is observed it 
implies a defect in the test itself, in the 
giving of instructions, or in both. The effect 
of the failure to arrive at insight is to impair 
both the reliabilty and the validity of the 
test. This seems to be the case in our 
experience with the McCauley test. 


Prof. Mann has said that the test appears 
to be too difficult for Freshmen students. Our 
observation is that the instructions are not 
comprehended by many of the students at 
the end of the ten-minute period allowed for 
reading these directions and trying out four 
of the items. The obvious line of improve- 
ment here is to lengthen the practice period 
and to simplify the directions, making sure 
that every examinee understands the relation- 
ships between the drawings and the cor- 
responding tetrahedrons. 


However, this may not be enough. The 
initial difficulty of perceiving the relationships 
between drawing and solid may still remain, 
since a tetrahedron is such a complex figure 
that twenty-four drawings are made from it 
without including oblique views. It would 
seem to be better, in designing tests for three- 
dimensional perception, to begin with very 
simple solids well within the perceptual grasp 
of the most poorly endowed examinee. The 
successive solids might then progress in order 
of complexity with fewer alternatives from 
which the examinee must choose. 


[Vol. 73 No. ? 


In summary, it seems to us that the 
McCauley test in its present form has these 
defects: 


1. Its items do not have a progressive order 
of difficulty. 


2. It is often begun before the examinee 
comprehends the task—before he has 
gained insight. 


3. It begins at a level of difficulty much 
too high for many to whom it is given. 


4. It offers too many alternatives for effec- 
tive trial-and-error performance. 


5. The time limit should be increased to 
an optimum experimentally established. 


To remedy these defects it will be neces- 
sary to redesign the test, probably embodying 
some of the tetrahedrons in a longer series 
of solids. The principles to guide such design, 
also applicable to the original design of any 
similar test, follow from the above list of 
defects: 


1. It should have a series of items gradu- 
ated in a progressive order of difficulty from 
very easy to very difficult. 

2. Preliminary instructions and adequate 
practice in a trial period must continue until 
the examinee demonstrates by several suc- 
cessful performances that he comprehends 
the task. 


3. Its first and easiest items should be well 
within the ability of the most poorly en- 
dowed examinee to whom it is given. Con- 
versely, its most difficult items should tax 
the ability of the most highly endowed 
examinee. 


4. The alternatives for each choice should 
be limited to three or four, as among four 
rectangular solids but not counting such 
obvious incongruities as triangular pyramids, 
frustrums of cones, and similar solids which 
might also be items in the test. 

5. The maximum time limit should allow 
not more than three persons among one 
thousand to complete the whole test. 

Prior experimentation by one of the present 
writers* with perception testing in two and 
three dimensions supports the observation of 
McCauley and Mann that such tests give 
results poorly correlated with intelligence test 


3 Drak C. A., Inspection for Inspectors, America 


e, ‘ 
Machinist, 82:17: 766-768, August 24, 1938. 
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scores, and thus seem to be testing a function 
that is different from general intelligence as 
measured by the usual tests. Results from 
the use of such tests in selecting inspectors 
for work in factories indicate that intelligence 
may be disregarded as a factor in such se- 
lection, except, possibly, in a few cases in 
which very high intelligence is known to 
contribute to early and rapid labor turnover. 
However, in drawing such conclusions, one 
must be on his guard against the confusion 
that ensues from applying labels to the func- 
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tions that seem to be inferred from the tests. 
It should be borne in mind that Spearman 
contends that tests of perception are really 
tests of intelligence and that the usual 
intelligence tests are not. 

Perception testing is a fertile field for 
experimentation. The need is for more and 
better tests and for experimenters who can 
approach their task with reasonable freedom 
from the conventional notion that perception 
is nothing but a function of general intelli- 
gence as usually measured. 





EEE 








THE RELATIONSHIP OF SELF-RATING AND CLASSMATE 
RATING ON PERSONALITY TRAITS 


MARGARET J. DRAKE, SYDNEY ROSLOW, AND GEORGE K. BENNETT! 
‘ New York City 


The difficulty of determining the validity 
of personality tests has long been recognized. 
In the case of measures of abnormal tenden- 
cies the standardization has often been based 
on the discrimination between ‘“‘normals” and 
those displaying the given syndrome.’ This 
method is less appropriate for scales designed 
to measure traits which are not parallel to 
some psychiatric classification. The question- 
naire used in this study, An Inventory of 
Activities and Interests, is an example of the 
non-psychiatric personality measure. This in- 
ventory purports to measure social initiative, 
self-determination, financial resourcefulness 
(“economic self-determination”), and adjust- 
ment to the opposite sex, as well as a com- 
bination of these, denoted as “overall per- 
sonality.” The author of this inventory, 
Henry C. Link, defines personality as “the 
extent to which the individual has learned to 
convert his energies into habits and skills 
which interest and serve other people.’* The 
degree to which Link’s hypothesis is sub- 
stantiated by this inventory should be de- 
terminable by a comparison of the scores on 
the several scales (self-ratings) with ratings 
by intimate acquaintances. Two previous 
studies, one by the author, reported in the 
manual accompanying the form, and the other 
by an independent investigator, W. A. 
Thompson,*® have demonstrated the existence 
of some correspondence between scores and 
external evaluations of personality. Link used 
as a criterion group possessing effective per- 
sonality, teachers’ selection of pupils regarded 


1The psychometric data were obtained by Mrs. Drake, 
Chairman, Guidance Bureau, James Monroe High School. The 
experiment was planned and directed by Bennett and executed 
by Roslow. The authors are indebted to Dr. Henry E. Hein, 
Principal of the James Monroe High School for his coopera- 
tion in this study. 

2 Among the questionnaires so validated are the Psycho- 
Somatic Inventory, Ross A. McFarland and Clifford P. Seitz, 
“A Psycho-Somatic Inventory,”’ Journal of Applied Psychology, 
22, 1938, 327-339, and the Humm Wadsworth Temperament 
Scale, American Journal of Psychiatry, 92, P 163 ff. 

* Also known as the P.Q. or Personality Quotient Test, by 
4 C. Link, The Psychological Corporation, New York. 

* Manual for the P.Q. (1938 revision) op. cit. 

* William A. Thompson, ‘“‘An Evaluation of the P.Q. (Per- 
Pras A Quotient) Test,’’ Character and Personality. 1938, 6, 


as leaders by their classmates. High scores in 
social initiative and overall personality were 
found to be characteristic of this group. 
Thompson reports that the selection by the 
deans of those children possessing the most 
and least amount of each trait agreed essenti- 
ally with the test results. In both these 
investigations the ratings involved were those 
of teachers. 


PROBLEM 


In this experiment an attempt is made to 
compare the individual’s score on each scale 
of the Inventory of Activities and Interests 
with the ratings of coeval associates. If the 
inventory measures the possession of the 
habits of personality along the axes of its 
several scales, there should be a positive re- 
lationship between the responses of the 
individual and the judgments of his fellows. 
In this instance, self-ratings were determined 
from the responses to the items of the ques- 
tionnaire and classmate ratings from the 
composite judgment of other members of the 
class. 


SUBJECTS 


The subjects consisted of the members of 
three honor classes at the James Monroe 
High School, New York City. The students 
in these classes represent a superior high 
school population (mean I1.Q. 127). No selec- 
tion factor other than chance determined the 
division of the students among these three 
sections. The enrollment in these sections 
included 70 boys and 44 girls in the tenth 
grade. 

The population of these classes was 
unusually stable. Each pupil had been a 
member of his present section for at least 
one and one-half school years. Because of the 
wide area of the city from which these 
classes are drawn, association outside of 
school is infrequent. 


METHOD 


To obtain the ratings of classmates for 
each individual a question was prepared de- 
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signed to epitomize the characteristic sup- 
posedly measured by each scale of the 
inventory. These questions and designations 
of the corresponding traits are listed below: 


Question 


1. Is the person named more popular than 
most others in the class? 

2. Is the person named more friendly and 
sociable than most others in the class? 

8. Does the person named know what he wants 
to do next better than most others in the 
class? 

4. Is the person named more interested in 
earning or saving money than most others 
in the class? 

5. Does the person named get along with mem- 
bers of the opposite sex better than the 
average of the class? 


Trait 


(X) Overall aay ae gr gy which the 
Personality Quotient is determined. 
(SI) Social Initiative. 
(SD) Self-determination. 
(ESD) Economic Self-determination. 
(SX) Adjustment to the opposite sex. 


For each class a number of mimeographed 
answer sheets were prepared. Each sheet con- 
tained an alphabetical list of the names of 
the members of the class. Opposite each name 
appeared “yes”, “no”, and “?”. One such 
sheet, numbered to indicate the question put, 
was distributed to each member of the class. 
The students were then given these instruc- 
tions: 


“On this sheet are the names of the mem- 
bers of this class. After each name you 
will see “Yes”, “No”, and a question mark. 
The question to be answered for each 
name, except your own, is: ‘Is the person 
named more popular than most others in 
the class?’ If your answer is Yes, draw a 
circle around Yes. If your answer is No, 
draw a circle around No. If you can’t 
answer Yes or No, draw a circle around the 
question mark. Remember the question is: 
“Is the person named more popular than 
most others in the class?’”’ (Question 1) 


At this point the administrator wrote the 
question on the black board. When the stu- 
dents had completed answering this question 
for each name listed, the answer sheets were 
collected. A second sheet was distributed and 
the instructions were repeated for the second 
question. This procedure was continued for 
the remaining three questions. 

A cross tabulation by question was then 
made of the frequency of “yes”, “no’’, and 
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“?” responses for each name. Arbitrary 
weights of 2, o, and 1 were assigned respec- 
tively to these responses and a “rating 
score” was obtained for each individual for 
each question. 

During the next week the Inventory of 
Activities and Interests was administered by 
the guidance counsellor of the school who had 
also supervised the classmate rating program. 
No mention was made to the pupils of any 
possible connection between ratings and 
questionnaire, nor were any results made 
available to the students. The completed 
inventories were scored and checked for each 
scale. 


RESULTS 


In each of two sections, 38 pupils were 
present during the session in which the ratings 
of classmates were made. In the third section 
32 were present, although the enrollment was 
also 38. Since each pupil answered each ques- 
tion in respect to every member of the class 
except himself, for 76 of the cases classmate 
ratings were made by 37 individuals. Of the 
remaining 38 cases, 32 were rated by 31 class- 
mates and 6 by 32. The rating totals for the 
members of the third group were mutiplied 
by appropriate constants to compensate for 
the smaller number of raters. 

Consistency of the ratings within each class 
for each question was determined by a divi- 
sion of the raters into two equal groups and 
the computation of the correlation between 
the two series of total ratings. These coeffi- 
cients, corrected by the Spearman—Brown 
formula for the full number of raters and 
averaged by Fisher’s z function, are given in 
Table I. 


TABLE I 


THE CONSISTENCY OF RATINGS OF 114 INDI- 
VIDUALS BY THEIR CLASSMATES (31 TO 37 
RATERS FOR EACH INDIVIDUAL) ON EACH OF 
THE QUESTIONS 


Question Teo 
1. Is the person named more popular than 
most others in the class?__.-_---_---- .95 


2. Is the person named more friendly or 
sociable than most others in the class? .83 
3. Does the person named know what he 
wants to do next better than most 
others im the GO061 266.022 ct cccsns 85 
4. Is the person named more interested in 
earning or saving money than most 
others in the class?_____..--_-------- .79 
5. Does the person named get along with 
members of the opposite sex better than 
the average of the class?____.-_-_---- .93 
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The consistency of these ratings appears to 
be reasonbly satisfactory with the possible 
exception of question 4, dealing with earning 
and saving money. Lower consistency for this 
question may perhaps be explained by the 
paucity of extra-school association among 
these students. Financial astuteness_ will 
probably find relatively less opportunity for 
display within the school than will the other 
traits. 

The correlation between ratings of the 
several questions has been determined for two 


- of the sections. These coefficients are pre- 


sented in Table IT. 


TABLE II* 


AVERAGE COEFFICIENTS OF CORRELATION (FISH- 
ER’S z) BETWEEN COMPOSITE RATINGS OF 76 
STUDENTS OF BoTH SEXES 


[Vol. 7, No. ; 


At the time the Inventory of Activities 
and Interests was administered several of the 
students were absent, reducing the total num- 
ber to 98. Separate parallel forms of the 
questionnaire are used for boys and girls. 
Since scores from these forms are not directly 
comparable, the sexes are treated inde- 
pendently from this point on. 


Means and standard deviations of inventory 
scores for the experimental group are con- 
trasted in Table III with similar data for 
comparable grades reported in the manual. 
Significantly higher mean scores are obtained 
by both sexes of the experimental group in 
Self-Determination and by girls in Overall 
Personality (P.Q.). 

It seems probable that these differences are 
unimportant for the present experiment 
especially in view of Thompson’s statement’ 


Question 2 3 4 5 that persons with high P.Q. Test scores tend 
snl 69 .73 16 .70 to havea slight advantage in academic compe- 
2 ---------------- 48 .20  .73 tition and the fact that the norms reported 
: lie nice geen ad m in the manual are based on both oth and roth 


*It may be remarked in passing that this 
pattern of coefficients is similar to that given 
by the coefficients between scales in the manual 
for the Inventory of Activities and Interests 
(Link, op. cit. p. 10). 


grades. 


Table IV shows the coefficients of correla- 
tion between the score on each scale of the 
Inventory of Activities and Interests and the 

* Thompson, op. cit. 


TABLE IIT 


MEANS AND STANDARD DEVIATIONS OF EXPERIMENTAL AND STANDARDIZATION 
GROUPS FOR BoTH SEXES 


Boys Girls 
Group Experimental Standardization Experimental Standardization 
Seale M S.D M Me S.D. M S.D 
ae 103.3 18.0 100.0 17.0 113.4 14.3 100.0 17.0 
ees he 67.4 15.4 66.6 11.9 60.8 13.0 
Sea 65.8 13.2 59.2 12.8 72.0 11.2 62.5 12.5 
EE 34.8 9.6 36.3 8.4 34.3 6.0 34.7 6.8 
ee ae a2 6TS 22.4 7.0 31.2 5.8 26.8 6.6 
N=55 N = 462 N = 43 N = 430 


TABLE IV 


COEFFICIENTS OF CORRELATION BETWEEN SCORES ON EACH SCALE OF THE INVENTORY AND 
COMPOSITE RATINGS ON THE CORRESPONDING QUESTION 


Boys, N = 55 
QivsP.Q. Q2vsSI. Q3vsS.D. Q4vsE.S.D. Q.5vsS.X. 

Sane ee ae See .60 45 51 14 .65 
— idieecnaieediemigil-adiilestein .62 49 55 16 .67 
_ REST  t ensign .06 .07 .06 .09 .05 

Girls, N= 43 
a i to a 45 Al 35 19 55 
TT 46 45 38 aa 57 
_ | ESE eT ee .08 .08 .09 10 07 


* Corrected for unreliability of criterion only. [ric = [re/ (rec) %]. 
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composite rating by classmates on the 
corresponding question. 

It will be observed that significant positive 
relationship exists between each scale and the 
corresponding question with the exception of 
Scale E.S.D. and Question 4. There is also 
an apparent tendency for the coefficients to 
be higher for the boys than for the girls. 

If these coefficients were obtained between 
the scales and some more usual criteria it 
would be necessary to conclude that only 
slight correspondence existed between the 
scores and external measures of these traits. 
However, two factors peculiar to the present 
technique probably reduce the validity of the 
criterion. First, it is obviously impossible to 
convey in one sentence to a relatively naive 
individual the essential characteristics of a 
trait. Second, although the composite ratings 
reflect the opinions of many judges, the 
response of each judge is only the answer to 
a single question. The extent of correlation 
between one question and any other variable 
is necessarily restricted by its low discrimina- 
tory capacity. 

The lack of significant correlation between 
Question 4 and Scale E.S.D. was not entirely 
unexpected. It has already been mentioned 
that the pupils were poorly acquainted with 
each other outside of school. The observa- 
tions of classmate behavior with respect to 
economic resourcefulness could for the most 
part have been made only in extra-school 
activities. There are two possible explana- 
tions of the lower correlation between in- 
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ventory scores and classmates’ ratings for the 
girls. In the first place the higher mean 
scores and smaller standard deviations (as 
shown in Table III) suggest that the girls 
may be a more select group with a restricted 
range. In the second place the rating of girls’ 
personality may be less accurate because, as 
Thompson has suggested,’ their behavior is 
less overt than the behavior of boys. 


SUMMARY AND CONCLUSIONS 


A comparison was made between scores 
on each scale of the Inventory of Activities 
and Interests and classmate ratings on ques- 
tions designed to express the central aspect of 
each trait. Fifty-five boys and forty-three girls 
served as subjects for both ratings and 
questionnaire. The consistency of composite 
ratings by classmates ranged from .79 to .g5. 


The correlation between self-rating by 
means of the inventory and the rating of 
classmates for four of the five traits ranged 
between .49 and .67 for boys and between 
.38 and .57 for girls. The fifth trait, economic 
self-determination, was less closely related to 
classmates’ opinion for both sexes. This result 
is believed to be a function of the lack of 
extra-school association among the subjects. 


Evidence is given that significant positive 
correlation exists between four scales of the 
P.Q. Test and the judgments of classmates. 
The nature of the criterion precludes the 
usual interpretation of these coefficients. 

™ Thompson, op. cit. 














A STUDY OF THE VALIDITY OF SOME CARDIO- 
VASCULAR TESTS"? 
LEONARD A. LARSON 


Springfield College 
Springfield, Mass. 


INTRODUCTION 


Since 1900 many cardio-vascular tests 
have been developed. These tests were the 
“natural result of the rapid advance in the 
knowledge of the physiology of the 
circulatory-respiratory systems. The purpose 
in the development of these tests was to 
secure information as to the functional effi- 
ciency of the circulatory-respiratory systems. 
The test items used during this period are 
the normal values for systolic, diastolic, and 
pulse pressures; normal breath-holding abil- 
ity, and after-exercise breath-holding ability; 
the reaction of the circulatory-respiratory 
systems to graded exercise. The test items 
have been combined into test batteries by use 
of clinical judgement and statistical methods, 
to give more meaningful physiological in- 
formation than that given by a single test 
item. 

Many physiologists have questioned the 
validity of cardio-vascular tests. Many have 
raised questions about the relationship be- 
tween the various tests. Are they specific 
tests, or do they all give information as to 
the general body fitness? Do they indicate 
physiological changes in training and illness? 
It is the purpose of this research to attempt 
to answer these questions. 


Purpose of Research 


The purpose of this research is fourfold: 
(1) to determine the consistency between 
cardio-vascular tests in grading physiological 
efficiency, (2) to determine the significance 
of the various selected tests in indicating the 
physiological’ changes as the result of train- 
ing or illness, (3) to determine the validity 
of the eleven selected tests, and (4) to com- 
bine significant test items into the best test 
battery as an indicator of physiological 
efficiency. 


* An abstract of a dissertation submitted in partial fulfill- 
ment of the uirements for the Ph. D. degree in the School 
of Education, New York University, 1938. 

? This is a continuation of a series of researches conducted 
4 Dr. J. H. McCurdy and L. A. Larson in Cardio-vascular 

ciency. 


Statement of Problems 


The research includes the application of 
eleven cardio-vascular tests to four typical 
physiological groups of subjects of the college 
age range (seventeen to twenty-four). Two 
of these groups—varsity (60) and Olympic 
(40) swimmers, and Springfield College 
freshmen (500)—are typically efficient or 
“Good” in respect to cardio-vascular effici- 
ency. Approximately three-fourths -of the 
freshman group are physical education stu- 
dents. The third group is represented by one 
hundred thirty eight Springfield College 
infirmary patients who had “Poor” cardio- 
vascular efficiency. These subjects were in 
bed for two or more days with temperature 
resulting from some organic ailment, such as 
colds, grippe, etc. The examinations were 
made when body temperature was normal. 
The case studies constitute the fourth group. 
Two groups of subjects are included in the 
case study analysis; the first consists of 
twenty-seven subjects examined in the fall, 
and again after six weeks of training for 
swimming; the second consists of seventy- 
three subjects examined in the fall, and again 
after confinement to the infirmary for two or 
more days. These four groups of subjects— 
varsity and Olympic swimmers, infirmary, 
college freshmen, and case study—are used 
to answer the four questions suggested as the 
purposes of this research. 

The study is limited to male students of 
the college age range from seventeen to 
twenty-four. Only organically sound subjects 
were included in the four groups. Those 
with heart defects which affect function were 
eliminated. The “Poor” groups differed from 
the “Good” groups in some functional change 
in the circulatory-respiratory systems. 


EXPERIMENTAL PROCEDURE 


Eleven cardio-vascular tests were selected 
for the examination of the four groups of 
subjects. These tests were: McCloy Test, 
Barach Test, Stone Test, Tigerstedt Test, 
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Basal Metabolic Test, Difference between 
Standing and Horizontal Pulse Rate Test, 
Difference between Standing and Horizontal 
Systolic Blood Pressure Test, Pulse Pressure 
times Pulse Rate Test, Pulse Pressure times 
Pulse Rate divided by Diastolic Pressure 
Test, Crampton Blood Ptosis, and the 
McCurdy—Larson Organic Efficiency Test. 
The organization of the groups for statis- 
tical treatment was as follows: (1) the fresh- 
man group of approximately 450 subjects 
(“Good” group) was used to indicate the 
consistency between the eleven tests in classi- 
fying these subjects’, (2) the efficient groups 
(“Good’’—athletic, “Good”—freshmen, and 
“Good”—athletic and freshmen) were com- 
pared with the inefficient group (‘“Poor’— 
infirmary patients) to determine the validity 
of the various cardio-vascular tests; (3) the 
results of the examination on twenty-seven 
subjects in the fall were compared with those 
in the swimming examination after six weeks 
training, also the results of the fall examina- 
tion of seventy-three subjects were compared 
with the results of the infirmary examination 
of these subjects after they had spent two or 
more days in bed with increased temperature; 
and (4) the significant test items were de- 
termined by using the athletic group (varsity 


1 The original group consisted of 460 subjects. All the meas- 
urements were not secured for every subject. however; and 
some questionable measurements were discarded. The range of 
subjects is therefore from 460 to 308 (see Table I). 
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and Olympic swimmers) and the infirmary 
patients as the criteria. 

The statistical techniques used, with the 
exception of the product-moment correlation 
and the multiple factor analysis to determine 
test consistency, were those which express the 
significance of the difference between groups. 
These are the mean with the standard error, 
the difference in the means with the standard 
error of the difference, the critical ratio, and 
the bi-serial correlation. 

The data for the study were secured at 
Springfield College and Yale University. The 
freshmen, varsity swimmers, and infirmary 
groups were Springfield College students. The 
Olympic subjects were examined while in 
training at Yale University in preparation 
for the Olympic Games in Germany. The 
Olympic swimmers were under the direction 
of Coach R. J. H. Kiphuth of Yale 
University.” 


CONSISTENCY OF CARDIO-VASCULAR 
TESTS IN CLASSIFICATION 


The purpose of a cardio-vascular test is 
to discover the unfit and to determine the 
classification of the physiologically fit. The 
eleven cardio-vascular tests should therefore 


1 The writer wishes to express appreciation to Coach R. J. H. 
Kiphuth for his cooperation and encouragement in this experi- 
mental work. The examinations were made by Dr. J. H. 
McCurdy and L. A. Larson just before the athletes sailed for 
Germany. 


TABLE I 


CONSISTENCY OF TESTS IN INDICATING CARDIO-VASCULAR EFFICIENCY’ 


Basal Tiger- Diff. in Diff. in tt PPXPR. Cramp- Organic 
<P.R. Eff. 


Tests Barach Stone Metab. stedt P. R. Systolic + Dias. ton 
—.4628 —.2856 —.5844 —.2651 —.3704 .8044 —.4212 —.6172 .4166 —.1654 
McCloy * .025 + .029 * .021 * .029  .030 * .029 * .026 * .023 * 029 * 028 
(454) (454) (460) (454) (875) (429) (454) (454) (428) (545) 
—.0044 .7202 —.0133 4131 1007 4276 2644 —.0405 —.3851 
Barach * .031 * 015 * .033 + .030 # 032 + .026 « .030 * .033 * 024 
(456) (460) (429) (336) (428) (455) (429) (426) (592) 
.4342 .6808 —.1838 —.0525 .5560 6125 0178 6473 
Stone * .026 + .017 * .033 * 032 + .022 * 020 = 032 * 016 
(455) (434) (388) (482) (453) (456) (437) (593) 
Basal .6079 2596 0991 8925 8126 0668 2177 
Metabolic + .020 .030 + 032 * .007 * 011 * 032 + 030 
(459) (433) (434) (458) (459) (433) (457) 
—.2192 1254 8547 7815 2944 . 2825 
Tigerstedt + .035 + .032 * .009 * 013 * 030 + 026 
(359) (428) (459) (428) (427) (593) 
Diff. In Hor.— —.1969 —.0140 0047 5656 —.2466 
Std. P. R. * .035 * 035 * .038 * 023 * 031 
(337) (379) (308) (394) (428) 
Diff. in Hor.— .1862 —.0789 .6941 —.0497 
Std. Systolic * .0382 * .032 * .017 * .032 
(424) (426) (427) (427) 
9128 1047 0392 
PP X PR * 005 * 032 = 028 
(456) (427) (594) 
0607 2050 
PP xX PR * .032 * 027 
+ Diastolic (427) (598) 
0770 
Crampton * 082 
(427) 


1 Content of Cells: (I) Correlation, (2) Probable Error, and (*) No. of cases. 





TABLE II 
ANALYSIS OF CARDIO-VASCULAR TESTS 


Case Studies 


Fall Swimming 


Validity: Criteria 


“ Good" Swimmer 


Factor Analysis 


“Good” Fresh- 


men and Swim. 


ood” Fresh- 
men (500) 
“Poor” Infirm. 
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Fall—Infirmary 


Rotated Factor Loadings 
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N=73 
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2719 
1184 
5143 


2526 — 
5441 
4020 
2758 
0711 
6349 
1747 
0975 — 
5055 
4556 
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4656 — 
7074 — 
4214 


3902 


5708 
—.3873 — 
6779 

9239 — 
0608 — 

0506 
9463 
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0209 
2546 
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agree in the separation of the efficient subjects 
from the inefficient. The eleven test scores. 
secured on approximately 450 college fresh- 
men, or the “Good” group, were intercor- 
related to determine the degree of consistency. 
These correlations are presented in Table I. 
Only four relationships (Pulse Pressure 
Pulse Rate Test with Basal Metabolic Test, 
Pulse Pressure < Pulse Rate Test with Tiger- 
stedt Test, Pulse Pressure Pulse Rate 
divided by Diastolic Test with Basal Meta- 
bolic Test, and Pulse Pressure & Pulse Rate 
divided by Diastolic Test with Pulse Pres- 
sure < Pulse Rate Test) out of 55 reached 
or exceeded the .80 standard of significant 
consistency.’. 

Two reasons for these relationships and 
the lack of relationship between the eleven 
cardio-vascular tests can be advanced. The 
first is that the tests are specific in their 
indication of some function of the circulatory- 
respiratory system, and these indicators are 
unrelated to other functions of the same 
systems. The second is that the tests lack 
validity, and therefore do not agree in classi- 
fication. The first reason is analyzed by use 
of Thurstone’s Multiple Factor Method; the 


second by establishing criteria for test 
validity determination.” 
The eleven cardio-vascular tests are 


described by four factors. The Basal Meta- 
bolic, Tigerstedt, Pulse Pressure * Pulse 
Rate, and Pulse Pressure x Pulse Rate 
divided by Diastolic Tests correlate highly 
(above .82) with factor one. This shows that 
factor one is circulation resistance as indicated 
by diastolic pressure.’ 

In factor two, two highly significant 
correlations were found: Difference between 
Standing and Horizontal Pulse Rate Test, 
and Crampton’s Blood Ptosis Test. The factor 
was therefore identified as splanchnic vaso- 
motor efficiency as indicated by the relation- 
ship between systolic pressure and Pulse rate 
in the standing position as compared to the 
horizontal. 

Four tests correlate significantly with 
factor three. These were: Difference in 
Systolics, Barach, Crampton, and Organic 
Efficiency Tests. After analyzing the tests it 

‘See Table I. 


2See Table II for results of factor analysis and validity 
coefficients. 

1 Rotated factor loadin determined by the cclculation 
method yield only one of a number of possible solutions 
Other values may be secured by use of the graphic method 
however, in this problem the identifications were possible 


using the calculation method. 
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was concluded that the factor is heart energy 
during systole in excess of diastolic pressure. 


In factor four, the most significant correla- 
tion was found in the Organic Efficiency Test. 
[his seems to indicate that the factor is 
respiratory efficiency. 

These four factors describe the eleven tests 
to a range of from 51.30 percent to 100 per- 
sent; the percentage of uniqueness of the tests 
ranges from 48.70 to o percent. 


PHYSIOLOGICAL CHANGES IN TRAINING 
AND ILLNESS 


The purpose in the case study was to 
determine which of the eleven tests were 
significant in indicating the physiological 
changes in training and in illness. To de- 
termine the effects of training, the fall exam- 
inations of twenty-seven subjects were 
compared with the swimming examinations 
after six weeks training for varsity swimming 
competition; to determine the effects of ill- 
ness, the fall examinations were compared 
with the infirmary examinations of seventy- 
three subjects after being in bed for two or 
more days with some organic ailment. 


Only two tests of the eleven have signifi- 
cant differences between the means in the 
training group’ (Organic Efficiency Test and 
Barach Test). Four tests of the eleven 
showed a high degree of significance between 
the means of the fall and infirmary examin- 
ations' (Organic Efficiency Test, Stone Test, 
Tigerstedt Test, and Pulse Pressure * Pulse 
Rate divided by Diastolic Test). 


*See Table II for complete results. 
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VALIDITY OF CARDIO-VASCULAR TESTS 


To determine the validity of the eleven 
cardio-vascular tests, three typical physiolog- 
ical groups were used as the criteria: ““Good” 
(varsity and Olympic swimmers) and “Poor”’ 
(infirmary); “Good” (college freshmen) and 
“Poor” (infirmary); and “Good” (swimmers 
and freshmen) and “Poor” (infirmary). The 
statistical methods described under “Expevi- 
mental Procedure”’ were used to determine the 
significance of the test differences between 
the “Good” and “Poor” groups. The order 
of significance in terms of the three criteria 
is: Organic Efficiency Test, Stone Test, and 
Tigerstedt Test. The remaining tests have at 
most a slight degree of significance. 


RELIABILITY AND OBJECTIVITY 


The reliability of the three significant 
tests and of all test items was determined 
by Larson, using 21 subjects. The reliability 
coefficients ranged from .6708 to .9740. The 
objectivity of all test items and the two most 
significant tests was determined using student 
examiners. The correlations of objectivity 
ranged from .4150 to .8812, (Table III). The 
experimental conditions for this experiment 
however, were not satisfactory. The exam- 
iners were not experienced and they were 
constantly hurried in their measurements. 
These experimental conditions could only 
lead to greater fluctuations in the test scores. 


COMBINATION OF SIGNIFICANT TEST ITEMS 


The purpose in the combination of the test 
items was to develop, if possible, a cardio- 
vascular test which has a higher degree of 


TABLE III 


RELIABILITY AND OBJECTIVITY OF PHYSIOLOGICAL MEASURES 


Tests 


it. Systolic Pressure 
. Systolic Pressure 


BY eae 


. Diastolic Pressure 


i ee OD on eee 
i iniees 


Vital Capacity 


Breath-Holding after Ex. ___..________________ 
Standing Pulse Rate minus P.R. 2 mins. after Ex 


Short a Efficiency Test 
Organic Efficiency Test 
Stone Test 


Reliability Objectivity 
Examiner—Larson Examiner—students 
21 subjects 180 subjects 


-9715 .7183 
9599 8790 
-9536 -7525 
9451 .4150 
8963 6886 
-9466 6646 
8232 6213 
.9740 8812 
-7428 6470 
-8320 6111 
-7954 

.6708 5018 
.8878 A750 
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validity than any of the eleven tests in this 
research. The significant test items were de- 
termined by use of the “Good” group (varsity 
and Olympic swimmers) and the “Poor” group 
(college infirmary patients). Using these two 
typical physiological groups nine significant 
test items were found.’ The multiple correla- 
tion procedure was used with the good-poor 
criteria to determine the most efficient test 
battery. Ten physiological combinations were 
made for multiple correlation calculations. A 
short test consisting of three items, (sitting 
diastolic pressure, breath-holding after exer- 
cise, and standing pulse pressure), with a 
multiple correlation of .7501 was devised. It 
has a higher degree of validity than any other 
test in the study, except the McCurdy— 
Larson Organic Efficiency Test; its validity 
coefficient being .7216 as compared to .7913 
for the Organic Efficiency Test. 


The three test items were placed on a T 
scale, weighed by the beta value, and then 
combined into a composite score. The classi- 


1See Table IV for statistical results in the development of 
the short test battery. 


TABLE IV 


STATISTICAL RESULTS IN DEVELOPMENT OF A 
SHORT BATTERY OF ORGANIC EFFICIENCY 
CRITERIA 


(a) “Good”. Olympic Swimmers (40) and 
Varsity Swimmers (60). 
(b) (ass. College Infirmary Patients 
). 


SIGNIFICANT CORRELATIONS (Bi-serial)' 
0 = Organic Efficiency (“Good” and “Poor” 


Criteria) 
1 = Sitting Diastolic Pressure _____ —.6310 
2= Sitting Pulse Pressure________ .5427 
3 = Standing Pulse Pressure ______ .3822 
4 = Breath-Holding 20 seconds after 

iE ETE TID .5354 
5 = Vitel Capecity ............... .3302 


6 = Pulse Pressure — Diastolic (Sit) .6354 
7 = Pulse Pressure — Systolic (Std) .4355 
8= Pulse Pressure * Pulse Rate 


ene RENEE RR, « crm 5 cocina .3738 
9= Pulse Pressure « Pulse Rate 
se II ie sci nssccesdnatd aeons aiecm .4018 


MULTIPLE CORRELATIONS (Good—Poor Criteria) 


Ro. sssasers — -7583 Ro.s4s — -7540 

Ro.sases os -7566 Ro.sat — -7489 

Ro.eas == .7141 Ro.14a2 = .6995 

Re.os = .7123 Ross = ©7501 (Select- 
ed Battery) 

Re.se = .7837 Rous = .7186 


* Only the significant correlations are listed. 
The original battery included twenty-six tests. 
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fication scale is based on 1067 such composite 
scores. The scale is divided into ten divisions 
by the decile values. 


CONCLUSIONS 


1. The eleven cardio-vascular tests selected 
for this research, with the exception of four 
relationships out of a possible 55, are not 
consistent in classifying pupils in functional 
efficiency. The reasons for this lack of con- 
sistency are two: (1) the eleven tests are 
described by four different factors with a 
range of variance of 51.30 percent to I00 
percent; the uniqueness variance ranges from 
48.70 percent to o percent; and (2) the lack 
of validity. Only the McCurdy-Larson Or- 
ganic Efficiency Test reached the .80 standard 
for test validity. The Stone test, however, 
has a fair degree of validity. The best pupil 
classification is given by the Organic 
Efficiency Test. 


(2) The McCurdy—Larson Organic Effi- 
ciency Test is the most significant in indi- 
cating the physiological changes due to 
training and to illness; the degree of signi- 
ficance is high for illness (.6454) and slight 
for training (.2810). 


(3) The McCurdy—Larson Organic Eff- 
ciency Test is the most valid test of cardio- 
vascular efficiency. The Stone and Tigerstedt 
Tests are lower in validity, yet have sig- 
nificant validity. 


(4) The Differences in Pulse Rate Test, 
Pulse Pressure Pulse Rate Test, and the 
Pulse Pressure Pulse Rate divided by 
Diastolic Test have slight validity. 


(5) The McCloy Test, Barach Test, 
Basal Metabolic Test, Differences in Systolic 
Test, and Crampton Test are invalid accord- 
ing to the “Good” and “Poor” physiological 
groups used as criteria in this research. 


(6) In order to use a cardio-vascular test 
for individual diagnosis the test must be re- 
peated once with the mean score used as the 
index score. This procedure will increase 
the reliability to a significant value. 
(r=.67 to r==.80) 


(7) The “Short Organic Efficiency Test” 
developed in this study has a higher degree 
of validity than any of the tests in the study 
except the McCurdy—Larson Organic Effi- 
ciency Test. 
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AN ANALYSIS OF SOME NEW STATISTICAL METHODS FOR 
SELECTING TEST ITEMS* 


Rosert F. Barry 
University of Rochester and John Marshall High School 


Any addition to the already long list of 
methods for evaluating test items should be 
defensible on one or more of three grounds: 
greater speed, greater validity, or greater re- 
liability. The time-consuming labor of “best” 
existing methods, makes statistical selection 
of items impractical (10, 11, 13, 15, 18, 19, 
21, 24) for those who should use it most, i.e., 
the public school teachers. 

This paper sets forth a new method, whose 
chief virtues are speed and simplicity; com- 
pares the validity of items so selected with 
the validity of items selected by biserial r, 
and compares the consistency of its evalua- 
tion of items with the consistency of biserial 
r for the same items. 

The method gives an index of discrimina- 
tion which combines two distinct elements: 1) 
positional with respect to criterion cate- 
gories, and 2) quantitative with respect to 
deviations of obtained distributions from a 
standard distribution, category by category. 


DEVELOPMENT OF THE METHOD 


Degrees of ability commonly are expressed 
by the grade categories A, B, C, D, and E. 
In the most valid items, the proportion of 
successes should decrease in that order, 
ABCDE. Thus, proportionately more A 
pupils should pass an item than B pupils, 
more B’s than C’s, more C’s than D’s, and 
more D’s than E’s. 

Pupils must first be assigned to criterion 
categories by some independent objective 
measure. Then for each item, compute per 
cent success for each category. This makes 
it possible to arrange categories in order of 
their per cent success. In Table I this posi- 
tional factor is perfect for both items. How- 
ever, item 2 obviously is superior to item 1, 
since it discriminates by wider margins be- 
tween categories. 

The concepts underlying this positional 
factor, hereafter called D,, are simple. Let 1 
represent perfect order of categories in per 

* Abstract of thesis submitted to the University of Rochester 


in partial fulfillment of master’s requirements. Under the 
direction of Dr. Jack W. Dunlap. 


TABLE I 
Item 1 Item 2 
Observed Observed 
Per Cent Per Cent 
Category Success Order Success Order 
— ae 52 1 100 1 
Je 51 2 75 2 
cae 50 3 50 3 
— 49 4 25 4 
I oe cccutia ones 48 5 0 5 
De= 1 Dp=1 


cent passed. Let o represent complete re- 
versal of order. Since there are ro possible 
corrections in a 5-step scale, i. e., n(n—1) /2, 
each correction is penalized by subtracting 
one-tenth from perfection, 1. 

For example, if ranking by per cent pass- 
ing in each category gives the order BACDE, 
then there is one inversion. So one correction 
is necessary. Hence, D, is .o. If the order 
is CABDE, two corrections are necessary, 
making D, .8. To break a tie between cate- 
gories would be considered as a_ half 
correction. 

Now consider the second factor, quantita- 
tive discrimination, Dg. Entirely distinct from 
the position of the category, Dg is to 
represent the closeness with which the ob 
tained per cent pass approaches some stan- 
dard per cent success for the categories. 
Hence, we must first determine what standard 
per cents will be used. This involves: 1) the 
population in each category, and 2) the ideal 
spread between categories. Both of these are 
necessary in order to get a measure of the 
quantitative difference between items. 

The following assumptions form the basis 
for determining what percentage of success 
should be considered as “standard” for each 
of the five categories: 


1. That the optimum difficulty of an item 
is 50 per cent (5, 9, 12, 16, 17, 20, 22, 
23, 25). 

2. That the distribution of ability follows 
the normal curve. The proportion in 
each category is unimportant as long as 
it is reasonable. The ratio 10, 20, 40, 
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20, 10 per cent has the distinct advan- 
tage of being reasonable, simple to 
compute, and commonly used. 


3. That the per cent passing in each cate- 
gory, (the standards), should be respec- 
tively 95, 80, 50, 20, 5, with an average 
of 50 per cent. 


Table II shows the items of Table I com- 
pared with these so-called standards. 
Column 4 shows for each category of item 1 
the difference between the observed per cent 
-success and the standard. The summation of 
these differences is 144. In order always to 
express this sum as a decimal, divide it by 
the maximum possible summation of differ- 
ences, which is 400. This gives, then, for 
item 1, the decimal .36. However, desirabil- 
ity, i. e., closeness to the standards, would 
then be represented by a low value. There- 
fore, to have goodness represented by a high 
value, the complement, .64, is used as the 
value of Do. 

Xd 

The formula for Dg then becomes 1— ——— 

400. 

The next task is to combine D, and Dy 
into one value, the index of discrimination, 
I. D. This raises the question of their rela- 
tive importance. It is difficult to obtain a 
criterion for evaluating these two, although 
later Table VI will show that they are of 
fairly equal importance in selecting items in- 
dependently. Symonds (20) has shown that 
the reliability and validity of a test are func- 
tions of the item difficulty. Thurstone (22), 
Richardson (17), Urnbrock (23), Elveback 
(9), Cleeton (5), and Voss (25), have shown 
experimentally that both the reliability and 
the validity of an item decrease as difficulty 
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varies in either direction from 50 per cent. 
Plotting the index values against the per cent 
of difficulty shows that the higher index 
values are concentrated between 80 and 20 
per cent difficulty. The shape of the curve 
corresponded roughly with that of Thurstone 
(22) for reliability and with that of Voss (25) 
for validity. 

At present, then, D, and Dg are assumed 
to be equal in weight. Hence, they are com- 
bined by simple multiplication to give an in- 
dex, I.D., which lies between 1 and o. The 
index for item 1 becomes .64. For item 2. 
which is plainly superior, .95. 


CoMPARING THIS METHOD WITH BISERIAL R 


Both the index and biserial r were com- 
puted for 69 General Science items using 176 
pupils, against a criterion test of 66 items. 
The 22 “best” items and the 22 “worst” items 
by each method, together with a new crite- 
rion test, then were administered to another 
group of 176 pupils. 

In addition, both indices were computed on 
150 items from an Educational Measurements 
test using 250 college students. Sub-tests con- 
taining various numbers of “best” and 
“worst” items, together with a new criterion 
test of 150 items, then were administered to 
another group of 255 individuals. 


Correlations between sub-tests and the cri- 
terion, for General Science data, are shown in 
Table III. Note that the most desirable com- 
parison is a high correlation for the best 
items, with a low correlation both for the 
worst items and for the best items versus the 
worst items. It can be seen here that the 
items selected by the index are slightly, but 





TABLE II 
Item 1 Item 2 
Standard Observed Observed 
Per Cent Per Cent Per Cent 
Category Success Success Difference Success Difference 

(RE ae RS st 95 52 43 100 5 
Ne hackchion! LantdhiaGiawsapus tea antamaneceataae 80 51 29 75 5 
> pe 2 e Oe 50 50 0 50 0 
(ea oe ee 20 49 29 25 5 
OD sc hcncstee msaiiendeiebitetebedegsaadlaasidgniats 5 48 43 0 5 
Xdiffs —144 X<diffs — 20 
=d/400 = .36 =d/400 = .05 
De = .64 De = .95 
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TABLE III 
GENERAL SCIENCE DATA 


THE CORRELATIONS BETWEEN A CRITERION TEST AND VARIOUS COMBINATIONS OF ITEMS 
SELECTED BY THE Two METHODS 


Best 22 ITEMS - 
Worst 22 ITEMS _- 
CORRELATIONS OF ___- 


best 22 versus 
worst 22 items 


not significantly, more valid than those 
selected by biserial r. 

In Table IV with Educational Measure- 
ments data, an examination of the first four 
pairs of correlations, shows the index to be 
slightly more valid in three cases, and slightly 
less valid in the fourth. In the last two com- 
parisons, the situation is reversed with bi- 
serial r somewhat more valid than the index. 


Since all these differences in correlations 
are so small, the net result of this study is 
that there is no significant difference in the 
validity of the two methods. 

The important practical reason for using 
the index in preference to biserial r is the 
speed of the method. In computing both 
values, the Hollerith tabulating and sorting 
equipment was used. With the index, to count 
the passes per category on 150 items, required 
45 minutes. With biserial r 255 minutes were 
required to obtain the necessary totals for the 
same items. In the operations that followed 
machine tabulation, four times as much time 
was required to compute biserial r as to com- 
pute the index, although the most efficient 
computational methods available were used in 
both cases (7, 8). 


computation, 


index .699 | 
biserial r .685 

index .562 
biserial r .628 \ 

index 550 
biserial r .594 ( 


advantage 
of index .014 
advantage 
of index 
advantage 
of index 


.066 
.044 


OTHER INVESTIGATIONS PERTAINING TO 
THE INDEX 


Having established that the index equals 
biserial r in validity and excels it in time of 


several pertinent questions 
arise: 


1. Since several investigators have proved 
that both validity and reliability are 
definitely related to difficulty, what is 
the relationship thereto of both the index 
and biserial r? 

. Since the widespread use of a method 
for selecting test items depends on its 
ease of computation (2,14), is it pos- 
sible that still another method would 
save even more time than the index 
without affecting validity significantly? 

. Since the index is computed on the as- 
sumption that both D, and Dg have 
selective validity, to what extent do 
they assist in the validity of the index, 
as might be indicated by their respec- 
tive validities in selecting items inde- 
pendently? 

. Since, to a considerable extent, Dg is 
conditioned by difficulty, what is the 


TABLE IV 
EDUCATIONAL MEASUREMENTS DATA 


THE CORRELATIONS BETWEEN A CRITERION TEST AND VARIOUS COMBINATIONS OF ITEMS SELECTED 


Best 25 ITEMS 
Worst 25 ITEMS 
Best 40 ITeMs 


Worst 40 ITEMS 


CORRELATIONS BETWEEN BEST AND WORST 
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index .765 
biserial -756 
index A73 
biserial .468 
index .784 
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index .608 
biserial .620 
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advantage 
of index 
advantage 
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015 
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TABLE V 
COMPARISON OF THE CORRELATIONS OF THE FIVE METHODS 


12 items out of 69 22 items out of 69 


25 items out of 150 40 items out of 15) 


Gen. Science Gen. Science Educ. Meas. Educ. Meas. 
Method r Rank Method r Rank Method r Rank Method r Rank 
I. CORRELATION WITH THE CRITERION OF SUB-TESTS OF “BEST” ITEMS 
index ___ .624 3 index ___ .699 3 ingex ... .166 @2 index __. .784 3 
bis-r ___. .589 5 bis-r ____ .685 5 bis-r ..... .106 3 mer ous th 
Dy ancuce Se i ae 687 4 OO seiacieate .7386 4 DOW scmcancshial 820 |] 
ee 635 2 . ee .709 2 ee aw | _, PEF 808 2 
ae 597 4 {ss 74 1 . _—— 676 5 2 742 5 
II. CORRELATIONS WITH THE CRITERION OF SUB-TESTS OF “Worst” ITEMS 
index __. .457 2 index ___ .562 1 index ___ .473 3 index ___ .608 1 
bis-r _... .453 1 bis-r __._._ .628 3 bis-r _..._ .468 2 bis-r ____ .620 2 
| ee 483 3 . gees 629 4 Oe tases 429 1 Dp -----. .635 3 
ee 511 5 ao 613 2 eae. 563 4 eae 7124 4 
ae 492 4 diff ____. .689 5 a 655 5 -735 5 
III. INTRA-CORRELATIONS OF ABOVE (BEST vs. WORST) 
index ___ .389 2 index ___ .550 1 index ___ .409 3 index __. .574 2 
bis-r _... .870 1 bis-r _._.._ 594 3 ee | bis-r .... .600 1 
I: catia 405 3 I. Sesasteaenion .602 4 ON sctevacaxegae 3004 2 aie 620 3 
eee 481 5 aa 580 2 tet A777 4 ae eee 690 5 
Gu ..... 465 4 Ge cunnn ee CO ee sists 507 5 se .628 4 

validity of Dg alone as compared with TABLE VI 


that of difficulty alone in the selection 
of items? 


Using the same data, all of these questions 
can be answered by extending the study to 
include similar numbers of items selected by 
D, alone, De alone and difficulty alone—all 
methods which are simpler to compute. 


This has been done in the manner used pre- 
viously. The result is 40 correlations with the 
criterion, and 20 intra-correlations as shown 
in Table V. Notice that altogether 12 groups 
of items are used: 4 “best” groups, 4 “worst” 
groups, and 4 “intra” groups. This makes 12 
opportunities to compare the correlations of 
the five methods. 


Taking the rankings of the five methods in 
the twelve different groups shown above, re- 
sults in the construction of Table VI. It is 
significant that the index is the only one of 
these methods that was never below third 
place in the rank of its correlation. 


Since the time for computation is an im- 
portant practical consideration, records were 
kept to make possible the comparison shown 
in Table VIT. Since correcting the papers and 
obtaining the raw scores are common to all 
five methods, the following timing began as 
soon as the raw scores were obtained. (Table 
VII). 


SUMMARY OF THE RANKS ATTAINED BY THE 
FIvE METHODS IN THE TWELVE COMPARISONS 


Ranks Total Net 
Method Ist 2nd 3rd 4th 5th Ranks’ Rank 
index ....3 4 > = 2 26 1 
SS ee a... = 31 2 
eae | 4 4 0 33 3 
een 1 5 0 3 3 38 4 
a ar} *&-¢ 4 52 5 

TABLE VII 


COMPUTATION TIME FOR FIVE METHODS FoR 
EQUAL NUMBER OF ITEMS 


RSM Reet BE Ate? 2 minutes 
Spe aed See te a 4 minutes 
RR a ee Sie erates 8 minutes 
EPR epecnee ee a enon ee epee 15 minutes 
I 60 minutes 


CONSISTENCY WITH WHICH A METHOD 
EVALUATES AN ITEM 


In the literature on item analysis, very 
little appears in regard to the consistency o! 
the coefficients obtained by various methods. 
In comparing validation methods, Barthel- 
mess (3) correlated 25 item coefficients ob- 
tained with 98 pupils with the coefficients on 
the same 25 items obtained with 262 pupils, 
“including the first 98.” The inclusion of the 
original group in the computation of the sec- 
ond group tends to lower the effectiveness 0! 
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her correlations which were from .863 to .548 
for the ten methods she studied. Cook (6) 
mentions the “relative stability of indices” as 
being a factor to consider in future compar- 
isons Of validation methods. Abelson (1) 
made an empirical study of the McCall 
method. Using the coefficients on 40 items, 
computed with 68 and 69 pupils respectively, 
he obtained a correlation of .279 + .og9. The 
small number of items produces such a high 
p.E. as to make these findings of little value. 
The small number of cases (68 and 69) still 
further lowers the value of this study. Aside 
from these faults, he failed to compare this 
correlation with what another method might 
have obtained. 

So little has been done in this field that the 
last part of this study is devoted to a com- 
parison of the consistency of measurement of 
the three best methods described earlier in the 
study. This phrase, Consistency of Measure- 
ment, must be distinguished from Stability 
of An Item. There has been some confusion 
in the use of these phrases in the past. Brig- 
ham (4) and Wilson (26) both use “stability 
of the item,’ while Cook (6) uses “stability 
of the index.” The writer suggests that here- 
after “consistency” be used only to refer to 
the index, and that “stability” be used only 
to refer to the item. 


It is difficult to separate these two. Brig- 
ham (4) tabulates validity coefficients on 15 
items using two different groups of 500 each. 
He speaks of “the relationship evident be- 
tween the two series of values,’ but when they 
are correlated by ranks they yield but .28. 
Even this small degree of relationship cannot 
be ascribed entirely to stability of the items 
until the consistency of measurement of the 
method of computing the coefficients has been 
determined. 


Before attempting an investigation of con- 
sistency, certain conditions must be met: 


1. Use as large a number of objective items 
as possible so as to reduce the P.E. of 
the correlations. 

. Use two groups each as large as possible 
so as to reduce the element of error in 
the original item indices. 

. Equate the two groups in both range 
and distribution as nearly as possible. 

. Have the instructional conditions for 
the two groups as nearly identical as 
possible. 
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To meet condition $1 and $2, ninth year 
General Science data were used because they 
afforded the maximum number of pupils, who 
took an objective examination. Conditions $2 
and $3 were met in Table VIII by the largest 
possible number of pupils, and in Table IX 
by actually equating the groups on the basis 
of scores on the entire test. To insure instruc- 
tional equality (44), data for Table VIII were 
all taken from one school, while data for 
Table IX were taken from one school and at 
one examination even at the sacrifice of num- 
bers. Three validation methods were studied: 
biserial r, the index, and D,. 


Since ranking of items is the true purpose 
of item coefficients, and the coefficients them- 
selves are merely the means thereto, one table 
shows the correlations by ranks in addition to 
the correlations by coefficients. 


Three indices (this index, biserial r, and 
D,), were computed twice for 66 items. 
Twenty-four of these items were drawn from 
the January 1938 examination and the in- 
dices were based on 313 students. Forty-two 
of the items were selected from the June 1937 
examination and the indices were based on 
176 subjects. 


Sixty-five of these items reappeared in the 
June 1938 examination, which was given to 
319 subjects. Thus for these 65 items it was 
possible to compute each index on two sepa- 
rate groups. The 66th item was administered 
both in June 1937 and January 10938. 


For each index the paired values were cor- 
related, with the results shown in Table VIII. 
The values were then ranked and the corre- 
lation between the ranks obtained. The values 
computed by the index were superior in con- 
sistency to the values computed by either 
biserial r or D,. 


It was possible to divide the June 1937 
General Science group into two groups, each 
of 176 students, equated as to mean score 
and standard deviation on the entire test. 
The index, biserial r, and D,, were computed 
for 69 items for each group. This eliminates 
variation in instruction from year to year. 
For each method the correlations between the 
paired values are shown in Table IX. 


Examination of Tables VIII and IX indi- 
cate that the index is more consistent than 
biserial r or D,. It might be argued that the 
differences are attributable to chance, in view 
of the small number of items, 66 and 69. 








cs) 
to 
oO 


TABLE VIII 


CONSISTENCY AS SHOWN BY CORRELATIONS OF 
COEFFICIENTS, 66 ITEMS, 24 OF THEM FROM 
JANUARY 1938, 313 PUPILS, AND 42 ITEMS 
FroM JUNE 1937, 176 PupILs GENERAL SCI- 
ENCE NINTH YEAR, WITH 66 ITEMS, 1 FROM 
JANUARY 1938, 313 PUPILS, AND 65 ITEMS 
FroM JUNE 1938, 319 PUPILS 


Index Biserial r Dp 
Values ____ .693 + .04 .559+.06 .506 + .06 
Ranks ____ .733 + .04 .521+.06 .518 + .06 
TABLE IX 


CONSISTENCY AS SHOWN BY CORRELATIONS OF 
ITEM COEFFICIENTS, 69 ITEMS JUNE 1937, 


GENERAL SCIENCE EXAMINATION, Two 
EQUATED GROUPS OF 176 EACH 
Index Biserial r Dp 
Values___.784+ .04 .456+.06 .469 + .06 


However, since the direction of the difference 
remains the same in both tables, this does not 
seem plausible. 


CONCLUSIONS 
1. The order of validity as shown by the 
twelve comparisons of the five methods is 
first, the index, then biserial r, then D,, 
then Dg, and lastly difficulty. 


2. The order of computational time from 
fastest to slowest is first, difficulty, then 
D,, then Dg, then the index, and lastly 
biserial r. 


3. The order of consistency for the three most 
valid indices is first, the index, then bi- 
serial r, then D,. 


4. Introducing the factor of consistency 
lowers the importance of the factor of val- 
idity as a criterion for selecting a valida- 
tion method. 


5. The lack of absolute consistency of meas- 
urement lowers the importance of validity, 
and to some extent raises the importance 
of computational time. 

6. Dg alone and difficulty alone are not worth 
using because of low powers of discrimina- 
tion. 

7. The index is slightly more valid, consid- 
erably more consistent, and four times as 
fast as biserial r. 

8. D, is nearly as valid, nearly as consistent, 
and seven times as fast as biserial r. 
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THE VALIDITY OF THE MACHINE-SCORABLE COOPERATIVE 
ENGLISH TEST 


CONSTANCE M. McCuLitouGH 
Hiram College, and 


Joun C, FLANAGAN 
Cooperative Test Service 


With the development of large-scale test- 
ing programs, schools and colleges have ex- 
perienced a growing need for more efficient 
methods of scoring examination papers. 
Teachers in large city systems burdened with 
clerical work, persons responsible for the 
speedy scoring of placement examinations, 
and all those who feel that a teacher should 
be something more than a fixture behind a 
moving red pencil have been aware of this 
need for greater efficiency in test scoring de- 
vices. In the construction of the Cooperative 
English Test Form OM,* the Cooperative 
Test Service of the American Council on 
Education has attempted to fill this need. 

Form OM of the English test is a seventy- 
minute, controlled response, multiple choice 
test, in which the student responds to the 
items by marking an answer sheet. The rapid- 
ity with which such an examination may be 
scored is due chiefly to the fact that the re- 
sponse is always a choice which may be indi- 
cated by the mere position of a mark. The 
counting of the correctly placed marks is 
obviously much simpler than the scoring of 
tests in which the scorer must constantly con- 
sider the correctness of unique answers and 
the value of partially correct answers. The 
use of the answer sheet eliminates the turn- 
ing of test booklet pages in scoring. If a scor- 
ing machine is available, the answer sheets 
are scored with even greater dispatch. At the 
conclusion of the examination period the test 
booklets may be put aside for use in another 
testing program whose expense is merely the 
purchase of a new supply of answer sheets. 

Unlike previous forms of the Cooperative 
English tests, the machine-scorable form em- 
ploys the answer sheet technique in its usage 
sections. Like earlier forms, however, it com- 
prises tests of English usage, spelling, and 
vocabulary. The usage part consists of 60 
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crucial points of grammar and diction (12 
minutes), 60 common uses of punctuation (15 
minutes), 30 typical items of capitalization 
(5 minutes), and 15 items on sentence struc- 
ture, each requiring the selection of the best 
of four sentences (8 minutes). The spelling 
part (10 minutes) contains 45 items of four 
words each, of which one or none is mis- 
spelled. There are 100 test words in the 
vocabulary part (20 minutes), for each of 
which the word nearest in meaning is to be 
chosen from five possibilities. 


Since the publication of the new form in 
May 1938, a number of critics have voiced 
more loudly their objections to objective Eng- 
lish tests. It has been said that multiple 
choice items “give away” correct answers; 
that the use of the answer sheet, especially in 
the case of the punctuation section, puts a 
premium upon intelligence and puzzle-solving 
aptitude; that such a test is not a valid index 
of composition ability; and that an essay 
examination is the only appropriate way of 
measuring correctness and power of English 
expression. 


Through the cooperation of seven schools 
in four cities during May and June of 1938, 
data involving 2,000 high school students 
have been gathered which provide evidence 
on these much-debated issues and which es- 
tablish the answer sheet technique as unques- 
tionably adaptable to the field of English. 
Correlations have been obtained, using the 
scores made by large groups of high school 
students, which show substantial relation- 
ships between scores on the new test and 
the following criteria of validity: a minimum 
essentials test of grammar, scores on New 
York Regents’ examinations in third- and 
fourth-year high school English, which are 
three-hour examinations of the essay type, 
and the fifty-minute usage section of the Co- 
operative English Test Form 1937, largely a 
proof-reading test permitting free response 
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written in the test booklet. An additional 
study has been made to show the relationship 
between the usage section of the OM form 
and teachers’ estimates of the students’ com- 
position abilities. Further study has consid- 
ered variations in school population and dif- 
ferences among groups of students segregated 
according to grade and ability within a given 
school. It is the purpose of this article briefly 
to set forth these findings. 


FORM 1937 AND Form OM USAGE AND A 
MINtm™uM ESSENTIALS TEST OF 
GRAMMAR 


Evidence upon the relationship between a 
minimum essentials test in English grammar 
and Form OM and Form 1937 of the Cooper- 
ative English Test has been obtained in the 
high school of Suburban City A, a wealthy 
community near New York City. The mini- 
mum essentials test in question is a free re- 
sponse grammar hurdle of the type familiar 
to many English teachers. It resembles the 
Cooperative English tests in the inclusion of 
usage items. It is unlike these tests in that 42 
per cent of its content is of a technical nature, 
involving knowledge of grammar rules and 


terms rather than practical mastery of English 
correctness. 


In May 1938 a class of 247 roth grade 
students in Suburban City A was divided into 
two matched groups of approximately equal 
ability in the use of English, according to 
teachers’ estimates of oral and written work. 
The 119 students comprising one group were 
given the 40-minute usage section of the 
Form OM test; 128 students, the 50-minute 
usage section of the Form 1937 test. Both 


groups took the 40-minute minimum essentials 
test. 


A correlation coefficient of .669 reveals the 
definite relationship between the usage section 
of the OM form and the minimum essentials 
test. The correlation of the minimum essen- 
tials test scores with the Form 1937 usage 
scores yields a coefficient of .653. There is 
reason to believe that a separate score for 
usage on the minimum essentials test, exclud- 
ing the technical grammar items, would have 
been more closely related to the usage scores 
on the Cooperative English tests. 
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ToTAL Scores ON Forms OM AND 1937, AND 
New York REGENTS’ EXAMINATIONS 
IN ENGLISH 


Two hundred and eighty 11th grade stu- 
dents and 230 12th grade students in Sub- 
urban City A were given New York Regents’ 
examinations in English in the June following 
the May administration of the Forms OM 
and 1937 tests. The Regents’ examinations in 
English represent the essay type of examina- 
tion which, in the opinion of those who oppose 
objective testing, is the only valid type of 
measuring instrument in the field of English. 
The two grades, each divided into comparable 
halves according to teachers’ estimates of 
ability to use English with correctness and 
ease, were then examined, half by the Form 
OM test and half by the Form 1937 test. 
Correlations of Regents’ test scores with the 
total English OM scores, representing objec- 
tive measurement of usage, spelling, and 
vocabulary, are .793 in the r1th grade and 
.698 in the 12th grade. Regents’ scores and 
the Form 1937 scores are correlated .769 in 
the 11th grade and .695 in the 12th grade. 
The high degree of relationship between the 
essay type examination and the objective type 
is manifest in these indices. Were the essay 
examination more reliable, these correlations 
would probably be higher. 

Correlations of Regents’ scores and the 
Form OM usage part alone are .708 for the 
11th grade and .650 for the r2th, while those 
of Regents’ scores with the Form 1937 usage 
part are .731 for the 11th grade and .580 for 
the 12th. The coefficients for the 11th grade 
are logically higher than those for the 12th, 
since the third year high school Regents’ ex- 
amination in English contains more usage 
items than the fourth year examination, 
which is more concerned with literary criti- 
cism and acquaintance. Had scores on the 
Cooperative Literary Acquaintance and the 
Cooperative Literary Comprehension tests 
been added to the Cooperative English usage 
scores in computing the correlation with the 
Regents’ examination scores, doubtless the 
resulting coefficients would have been higher 
than those presented above. 


Form OM UsacE AND Form 1937 USAGE 


Because of the fact that the Form OM 
usage part is completely objective whereas 
the previous usage parts of the Cooperative 
English tests have been of the proof-reading 
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and completion type, special interest has cen- 
tered about the relationship between scores 
on Form 1937 and Form OM usage parts. 
Four high schools in two cities have con- 
tributed data for this study. One high school 
represents average ability and socio-economic 
status in Suburban City B near New York 
City. Three high schools, located in a large 
industrial city in the Middle West, represent 
the full scope of abilities and socio-economic 
status in that community. 


In all four schools students were admin- 
istered both the 1937 and the OM forms of 
the English usage test. Alternate students 
took Form OM first so that the practice effect 
of the tests might be equalized. In Suburban 
City B, where 350 oth graders were exam- 
ined, usage scores on the two forms yield a 
coefficient of .842. For 103 roth grade stu- 
dents in the midwestern city a coefficient of 
.848 is obtained; for 140 11th grade students 
a coefficient of .854; and for 112 12th 
grade students a coefficient of .881. Plainly 
the two types of test are measuring similar 
abilities. 


Of the two types the Form OM has at least 
two advantages. It requires a matter of nine 
seconds in the scoring machine or slightly 


more than a full minute for hand-scoring, 
while the scoring of the Form 1937 requires 
at least ten minutes on the part of a skilled 
reader. The OM form of usage test is to be 
preferred also for the fact that it contains 
thirty more points than the 1937 test and 
that it yields a standard deviation ten points 
greater in the typical high school group. In 
other words, the objective form tests more 
points of usage than the Form 1937 and 
shows greater variation in English usage 
scores in typical groups, according to our test 
results. 


MACHINE-SCORABLE ENGLISH TEST 


Form OM anp ABILIty GROUPING 


A suggestion of the extent to which a 
school’s judgment of students’ academic abil- 
ities agrees with the measures of English abil- 
ity given by the Form OM test is shown in a 
study of a New England city high school. 
Students in the school are segregated into 
three curriculum divisions according to their 
marks prior to entrance into the ninth grade. 
Students of academic promise enter a typical 
college preparatory course. Average students 
whose aspirations and abilities do not suggest 
formal education beyond high school are 
given a social arts course of modified content 
and academic standard. Those students who 
are deficient in language skills and scholastic 
ability follow a technical arts course. In so 
far as academic success is determined by mas- 
tery of language, students’ scores on the Eng- 
lish OM test should discriminate among 
these groups. 

In Table I the average scores of 285 roth 
and 11th grade students who took the form 
OM test are considered separately in three 
curriculum divisions. The average scores are 
comparatively low (note total _ possible 
scores) because of the fact that they represent 
grades 10 and 11, whereas the test is suitable 
for college students as well. The differences 
among these mean raw scores are obvious. 

Suburban City B sections its gth grade 
students into eleven ability groups, in which 
placement is determined on the basis of 
native ability (as measured by a group intelli- 
gence test), English ability, and achievement 
in previous school subjects. This sectioning 
occurs at the beginning of the 9th grade. The 
scores of 350 goth grade students on the OM 
usage test correlate .762 with this ability 
grouping. If it seems surprising that the cor- 
relation is so high, it should be remembered 


TABLE I 


MEAN RAw Scores ON ForM OM For NEW ENGLAND CITY CURRICULUM GROUPS 
GRADES 10 AND 11 


No. of 
Cases 


Curriculum Group 
College Preparatory 


Social Arts 
Technical Arts 


Total Possible Score 


Mean Raw Scores 
Usage Spelling Vocabulary 
81.0 17.7 36.1 
87.0 19.4 38.0 
57.6 9. 17.9 
66.2 14, 24.2 
7 
6 


Grade 


44.3 13.9 
44.4 


165.0 45. 


AOR 


23.2 
169.0 
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that the majority of high school subjects of 
the traditional type require the use of Eng- 
lish skills, and that verbal intelligence is 
measured in the group test. Scores on the 
usage section of Form 1937 for these same 
students produce a coefficient of .798 with 
the ability grouping. Apparently the free- 
response type of test (Form 1937) taken 
without an answer sheet and the objective 
Form OM, which is completely machine- 
scorable and administered with an answer 
sheet, are about equally related to the factors 
which are the basis of sectioning in this 
school. 


UsaGE Form OM AND VERBAL INTELLIGENCE 


The punctuation section of the Form OM 
usage test has aroused much comment be- 
cause the student is required not only to de- 
cide what punctuation is needed at certain 
numbered points in a passage but to place a 
pencil mark under the appropriate punctua- 
tion marks on the answer sheet. In spite of 
the fact that the administration of the OM 
form is preceded by special exercise on a 
practice answer sheet, some critics fear that 
the specialized nature of the response to the 
punctuation section necessitates unusual in- 
telligence, puzzle-aptitude, and, in the words 
of one facetious commentator, “tweezer 
dexterity”. 

The vocabulary parts of the Cooperative 
English tests have always correlated closely 
with verbal intelligence. Should the vocabu- 
lary part of the test be found to correlate 
more highly with the usage part of the OM 
form than with the usage part of the 1937 
form, the implication would be that the OM 
form required more intelligence. If the vocab- 
ulary part correlated more closely with the 
punctuation section than with the OM usage 
part as a whole, obviously the punctuation 
section would be requiring more intelligence 
than the usage part. 
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Certain obstacles confront the investigator 
in a study of these relationships. One is that 
no direct comparison between the punctuation 
items of the 1937 and OM forms is possible, 
since the 1937 usage part offers a running 
passage for usage correction with no separate 
consideration of punctuation. Another is that 
matters of punctuation may or may not actu- 
ally require more intelligence than matters of 
grammatical usage, sentence structure, and 
capitalization. A third is that, given two 
measures having the same true relationship 
with a third, the measure involving the more 
items will in general yield a higher correla- 
tion with the third because of its greater dis- 
criminatory power, and the measure requiring 
the greater length of time will produce more 
reliable indices of this relationship. The punc- 
tuation section of the OM form has a time 
limit of 15 minutes, while the entire OM 
usage part is given 40 minutes, and the usage 
part of Form 1937 is a 50 minute test. 


In spite of all these uncertainties the data 
presented in Table II are rather convincing in 
their support of the thesis that the OM form 
of the usage test, including the punctuation 
section, is quite comparable to the 1937 form 
in respect to the factors tested, and appar- 
ently does not discriminate to a greater extent 
against students of poor verbal ability. In 
grades 10, 11, and 12 in the Midwestern city 
the coefficients of correlation between the two 
forms of the usage test range from .85 to .88. 
The usage part of Form 1937 correlates 
slightly more closely with vocabulary scores, 
and the coefficient designating the relationship 
between punctuation scores and vocabulary 
scores is lower than that for the entire OM 
usage part. 

A new practice sheet has been issued by 
the Test Service which deals specifically with 
the type of response called for in the punctua- 
tion section. If an individual student is of 
such low verbal intelligence that the direc- 


TABLE II 


RELATIONSHIPS AMONG USAGE, PUNCTUATION, AND VOCABULARY SCORES IN GRADES 10, 11, AND 
12 IN A MIDWESTERN CiTy, ForMS 1937 AND OM 


Form and Grade 


Form OM Usage 
10th Grade 
llth Grade 
12th Grade 


Form OM Vocabulary 


Coefficients of Correlation 


No. of Form 1937 Form OM 
Cases Usage Usage Punctuation 
103 .848 
140 .853 
112 .881 
356 -724 .690 .622 
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tions for the punctuation section present espe- 
cial difficulty, the new practice sheet should 
remove this barrier before the actual admin- 
istration of the examination. So far, the Test 
Service has no evidence on the relationship 
between punctuation scores and “tweezer 
dexterity”! 


UsaGeE OM AND 1937, AND TEACHERS’ 
ESTIMATES OF COMPOSITION 
ABILITY 


Although marks in free composition are 
notoriously unreliable, they are probably the 
most common and specific measure of expres- 
sion in English. Teachers in the Midwestern 
city high schools whose students took the Co- 
operative tests were asked to indicate each 
student’s composition ability, both oral and 
written, by the letter grade A, B, C, D, or E, 
according to his standing in his grade and 
school. Table III shows the coefficients pro- 
duced by correlation of the usage parts of the 
OM and 1937 forms with the teachers’ 
estimates. 

The fourteen coefficients vary from .165 to 
.772. Nine of them are larger than .500. The 
variation observable in the table is spuriously 
large because of the sizes of the groups meas- 
ured and the paucity of letter-grade classifi- 
cations. Differences of opinion among teach- 
ers as to what factors constitute successful 
composition and in what proportion they con- 
stitute it are another possible reason for the 
vagaries in the coefficients. Teachers differ, 
too, in the amounts and kinds of evidence 
they have of students’ abilities to use English 
with correctness and ease. However, there can 
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be no doubt that some of the factors deter- 
mining the teachers’ estimates are present in 
the Cooperative English tests. While five of 
the seven pairs of coefficients in Table III 
suggest that the Form 1937 usage test is the 
better measure of composition ability, the dif- 
ferences are so slight that the evidence cannot 
be considered conclusive. 


CONCLUSIONS 
The conclusions which may be drawn from 
the foregoing data are as follows: 


1. The usage parts of both the Form 1937 
and the all-objective Form OM of the Co- 
operative English Test yield scores which 
agree fairly substantially with those obtained 
from a roth grade test of minimum essentials 
of grammar. The correlation coefficients are 
respectively .65 and .67 for the two forms. 


2. Scores on the usage parts of these two 
forms of the Cooperative English Test show 
a substantial amount of agreement with the 
third- and fourth-year English examination of 
the New York Board of Regents. The corre- 
lation coefficients range from .58 to .73 and 
are similar for the two forms. 


Combined scores for the usage, spelling, 
and vocabulary parts of the Cooperative Eng- 
lish Test show a somewhat closer relationship 
with scores on the Regents’ examination, the 
coefficients ranging from .70 to .79. It should 
further be noted that the addition of the Lit- 
erary Acquaintance and Literary Comprehen- 
sion parts of the Cooperative English battery 
would doubtless produce a considerable in- 
crease in the agreement of the scores for the 


TABLE III 


RELATIONSHIP BETWEEN TEACHERS’ ESTIMATES OF COMPOSITION ABILITY AND SCORES ON THE 
USAGE Parts oF ForMS OM AND 1937 OF THE COOPERATIVE ENGLISH TEST 


Grade and School 
Grade 10 


Coefficients of Correlation between Teachers’ 


Estimates of Composition Ability and Usage Score 


Form 0M N 
165 32 
519 24 


Form 1937 
394 
-706 


391 43 
678 34 
525 63 


362 
772 
554 


615 40 
539 34 


425 
554 


* H, M, and L stand for high, medium, and low, and refer to the levels of native ability 


and socio-economic status in the schools. 
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objective tests with these total scores for the 
essay-type test. 


3. A comparison of scores obtained by stu- 
dents on the all-objective and the proof- 
reading forms of the Cooperative English 
Test indicates that the results from these two 
types of test are in rather close agreement. 
Correlation coefficients range from .84 to .88 
when students’ scores are compared within 
each single grade from the oth to the 12th. 


4. An analysis of the scores of students 
who have been segregated into ability groups 
in two school systems shows a very high de- 
gree of relationship between the combined 
criteria which have been used for sectioning 


purposes and scores on the Cooperative 
English Test. 


5. Analysis of the results for the vocabu- 
lary and usage parts of the Cooperative Eng- 
lish Test lends no support to the opinion that 
the all-objective, machine-scorable test puts 
a greater premium upon verbal intelligence 
than does the usual test situation. 
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6. High school teachers’ estimates of their 
students’ powers of oral and written expres- 
sion bear a varying but significantly positive 
relationship to success on the usage parts of 
the Cooperative English Test. The median of 
fourteen coefficients of correlation between 
these variables is .53. The two forms of the 
usage test show a similar degree of relation- 
ship to these estimates, with the Form 1937 
test correlations slightly but not significantly 
or consistently higher. 


If there were in existence or in immediate 
prospect an essay examination highly reliable 
not only in its scoring but in itself, it is 
doubtful that even the most ardent propo- 
nents of objective testing would hesitate to 
declare such a direct measure of English ex- 
pression superior to a more indirect measure. 
But at the present time, the objective form of 
examination is the only means by which one 
may discern the extent of English skills and 
powers with speed, with uniformity, and with 
the personal equation of the examiner oper- 
ating at a minimum. 
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A STUDY OF CERTAIN FACTORS INFLUENCING ACADEMIC 
ACHIEVEMENT WITH SPECIAL REFERENCE 
TO THE HEALTH FACTOR 


LowELt N. DovucGLas 
Baylor University 


THE PROBLEM 


The problem concerns a study of certain 
factors thought to condition achievement in 
English at Baylor University with special 
reference to the health factor. 


PREVIOUS STUDIES 


A survey of literature reveals that for some 
time there has existed the belief that there is 
some direct relationship between mental and 
physical functioning. The general situation is 
presented in Tables I and II. 


TABLE I 


SUMMARY OF PREVIOUS INVESTIGATIONS OF 
PHYSICAL AND MENTAL RELATIONSHIPS 


No Positive Negative 
Rela- Rela- Rela- 
Studies tionship tionship tionship 
Before 1900 6 
1900-1910 8 
1910-1920 20 
1920-1930 18 
1930-—Present —__ 6 


58 


SUMMARY OF RELATED HEALTH FACTORS AND 
MENTAL ABILITY 


No Positive Negative 
Rela- Rela- Rela- 
Studies tionship tionship tionship 
Nutrition 3 3 1 
Tonsils and Ade- 
noids 3 0 
Glandular 
Therapy 1 0 0 
Intestinal Toxema 0 1 0 


The studies referred to in Tables I and II 
employed various methods in attempting to 
measure the health factor. Anthropometric 
measures were first used: weight and height 
were used individually, then in combination; 


soon sitting height was added as a separate 
measure. On the basis of a great number of 
such measures, standards corresponding to 
age levels were worked out. Then came the 
idea of measuring vital capacity, breathing 
capacity or chest circumference, then expan- 
sion. This measure was usually added to the 
weight/height index. Vierordt’s formula for 
determining weight in connection with body 
length and chest circumference endeavored to 
present this combination mathematically. 
Some investigators believed that strength was 
a factor in physical well-being, and intro- 
duced the measures of grip, generally used 
with other measures. Naccarati evolved the 
morphologic index which interprets the body 
in terms of the length of the extremities and 
the volumetric value of the trunk. Students 
of anatomy introduced ossification ratios for 
use primarily in the measurement of children. 
Crampton, in studying boys, declared that 
the age of pubescence affected physical and 
mental health. Still other individuals felt 
that the mere listing of the number of physi- 
cal defects found in an individual would yield 
his health score. Several studies were con- 
ducted on this basis. Dr. Beyers of the 
United States Navy felt the inadequacy of 
previous attempts at measuring health and 
employed in addition to the weight/height 
index, Vierordt’s formula, etc. a physical con- 
dition scale. Perfect health was rated at 
100%, and the scale consisted of six sections. 
On this he, as examining physician, rated the 
individuals under observation. The measure 
was subjective, it is true; but it was the sub- 
jective judgment of authority based on thor- 
ough physical examination. This method he 
employed in 1900. Since that time many ad- 
ditions have been made to physical examina- 
tion procedure, making it more reliable and 
more objective. Such is the summary of the 


measures employed in the previous studies of 
health. 
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THE PROCEDURE EMPLOYED IN THIS 
INVESTIGATION 


In the present study 109 freshman boys at 
Baylor University were enrolled in three Eng- 
lish classes under the direction of one teacher. 
The time consisted of daily classes of one 
hour five days a week for twelve weeks. Five 
measures were used in estimating achieve- 
ment in English for each of the participants: 
(1) teacher’s marks, (2) departmental test 
averages, (3) the scores on the Purdue Place- 
ment Test in English, Form B, administered 
at the end of the term, (4) the raw gain score 
as shown by the difference between the scores 
on Purdue, Form A, administered at the be- 
ginning of the term, and, Form B, admin- 
istered at the conclusion of the term, and 
(5) the percentage of gain as disclosed by 
Form A and Form B of the Purdue Test. 

Other factors considered in the study were 
initial English ability, measured by the Amer- 
ican Council on Education Cooperative Eng- 
lish Test, 1937, the general high school aver- 
age, the high school English average, the 
average daily study time, study habits as ana- 
lyzed by the Wrenn Study Habits Inventory, 
intelligence as measured by the American 
Council on Education Psychological Examina- 
tion for College Freshmen, 1937 Edition, 
reading comprehension as determined by the 
Iowa Silent Reading Test: Advanced, Form A 
(Revised), rate of silent reading as measured 
by the same test, leadership as indicated by 
the Morris Trait L by Elizabeth Morris, per- 
sonality as rated by the Bernreuter Person- 
ality Inventory, socio-economic status as de- 
termined by the Sims Score Card for Socio- 
Economic Status, social adjustment as esti- 
mated by the Washburne Social Adjustment 
Inventory, and chronological age. 

For the measurement of the health factor, 
health was considered from the physiological 
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viewpoint, the comprehensive medical exam- 
ination being employed. Much consideration 
was given to the determination of the com- 
prehensive physical examination form. Forms 
used in numerous university and health sery- 
ice divisions, hospital and clinic forms, forms 
used by individual physicians, and those used 
by major insurance companies were carefully 
studied. Then from a study of standard text- 
books in physical diagnosis and health exam- 
inations, the items for the health examination 
were determined, as well as the tests sug- 
gested for measuring the items listed. The 
form? thus constructed was then given author- 
itative approval by a board of seven practic- 
ing physicians. The university physician for 
men administered all tog of the examinations 
in order to have a uniform evaluation of each 
item. He was assisted in the non-technical 
parts by the university nurses. Students show- 
ing definite tendencies toward defects were 
fluoroscoped or x-rayed. The basal meta- 
bolism test necessitated a second appointment 
with each student. 

In addition to the physical examination 
administered to each participant in the study 
at the first of the term, a weekly health exam- 
ination was made. Because of fluctuation in 
some health measures such as temperature, 
pulse rate, and weight due to individual habits 
and the time of day of recordings, the same 
day each week and the same hour for the 
weekly appointment was scheduled for each 
student. The university physician admin- 
istered the tests and recorded the health his- 
tory of the individual for the week together 
with his personal observations. Twelve 
weekly recordings were made according to 
the following weekly health examination 
form: 


2The physical examination form used in the study was 
not included in this review because of lack of space. Those 
desiring a copy please request it of the author. 


ES ae ee a ae RO gins csiticicectonitnnsasepsitnizaamtpindiionns 

First Week 
Day of Week_____----_-_ Se Time of Examination______._______ Throat 
Temperature__________ Blood Pressure ~..-------- ERT ete 
| SE eee After exercise__._________- gk I , I 
Remarks: 
Have you suffered from any of the following during the week? 
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Thoracic pains 
Abdominal pains 
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EVALUATION OF STUDENT HEALTH ON THE 
BASIS OF THE COMPREHENSIVE PHYSICAL 
EXAMINATION AND THE WEEKLY PHYSICAL 
EXAMINATION 


At the present time there is no standard 
method of scoring a comprehensive physical 
examination, even though several attempts 
have been made to score such examinations. 
One method employed in several studies is to 
score the individual according to the number 
of physical defects revealed by the examina- 
tion. The unsoundness of this method is at 
once apparent because the nature of the de- 
fect is of more significance than the number 
of defects. Another method employed, prima- 
rily by insurance companies, is to rate the 
individual’s health by (1) the number of im- 
pairments, (2) impairments present, (3) im- 
pairments demanding medical treatment, and 
(4) impairments demanding immediate med- 
ical treatment and care. Varying interpreta- 
tions of the four levels and the fact that the 
impairments are concerned with longevity 
rather than immediate health status and 


health function make the plan impractical for 
use in the present study. Some theorists have 
suggested that various values be assigned to 
the systems of the human organism and the 
individual’s health be evaluated by taking the 


sum of these systemic evaluations. Since the 
body systems are so highly integrated and 
correlated that separate functioning is impos- 
sible to ascertain, this measure seems un- 
sound. In spite of the absence of a standard 
method of scoring physical examinations, all 
physicians make such evaluations when they 
say that a person is in “good health” or “poor 
health.” 


In this particular study the classification of 
the health form was made by an appeal to 
authoritative opinion. First, an assumption 
was made that there are at least six distin- 
guishable classifications of individual health 
status: (1) very good, (2) good, (3) mod- 
erate, (4) poor, (5) bad, and (6) very bad. 
Such classifications we have identified by the 
letters A, B, C, D, E, and F. In this manner 
we have avoided the giving of dubious numer- 
ical values. Each member of a group of five 
physicians was given the comprehensive 
physical examination form and asked to indi- 
cate just what conditions should be present 
to indicate ratings of A, B, C, D, E, and F. 
After an opportunity for individual study, the 
physicians met as a group and discussed the 
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independent classifications, listing the func- 
tionings and defects which would place an in- 
dividual in the respective classes. Classifica- 
tion of the comprehensive physical examina- 
tion of each student participating in the study 
was then made according to the rating scale 
determined by this authoritative judgment. 
Table III shows the distribution of the stu- 
dents according to the comprehensive physical 
examination rating: 


TABLE III 


DISTRIBUTION OF STUDENTS ACCORDING TO THE 
PHYSICAL EXAMINATION RATING 


Number of 
Students 


CRITERION FOR CLASSIFYING SUBJECTS 
ACCORDING TO THE PHYSICAL 
EXAMINATION FoRM 

Class A 

The normal cardiac measurements set up 
in Class A were taken from a _ recognized 
standard text book of medicine. The heart 
measures were ascertained by percussion and 
palpation. In any case of doubt as to the size 
of the heart, fluoroscopy was carried out for 
confirmation. Normal heart sounds were nec- 
essary for this classification; all functional 
murmurs were not included. The normal pulse 
and response to examination were in accord- 
ance with accepted standards. Normal chest 
findings were included; all doubtful cases 
were fluoroscoped or x-rayed. No case pre- 
senting any weakness in the inguinal canal 
was included in this class. Both eyes were re- 
quired to be 20/20 as tested by the Snellen 
chart. Those cases presenting color blindness 
were not included in this group. The internal 
and external examination of the nose was re- 
quired to be normal. Those cases which had 
tonsi!s were not included in this group. Hear- 
ing in both ears was regarded as normal if 
conversational tones were heard at 50 feet. 
The weight and height were important factors 
in this class; those cases presenting wide vari- 
ations were not included. The nervous and 
osseous systems were regarded as normal if no 
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defect was found. The urine was required to 
be free of albumen and sugar, and negative 
as to the microscopic examination. 


Class B 


The main difference between Class A and 
Class B does not rest in the cardiac or lung 
findings. In fact, a very little difference be- 
tween these groups as far as health findings 
were concerned exists. This group was re- 
quired to have normal heart and chest find- 
ings. Any hernia excluded the patient from 
the group. Eyes which showed 20/20 vision 
in both eyes with slight correction were in- 
cluded in this class. Slight septal deviation in 
the nose was included in the class. History of 
occasional colds in patients placed them in the 
B class. 


Class C 


The heart and lung findings were required 
to be normal as set up in Class A. Hernia 
which had been repaired for at least eighteen 
months was included in this group. The cases 
with vision of 20/20, with correction of a 
mild hyperopia or myopia, were placed in this 
group. Any septal deviation with evidence of 
accessory sinus infections or polypi placed the 
patient in this group. The presence of fre- 
quent colds in the winter was taken into con- 
sideration in this class. The presence of ton- 
sils and adenoids with evidence of gross in- 
fection was sufficient to place the patient in 
the group. Slightly underweight individuals 
were placed in this group. History of rheu- 
matic fever in childhood was regarded as 
sufficient evidence for this class. The main 
differences in this class and Class B rest in 
the mild refractive errors of the eyes, the 
history of frequent colds with evidence of 
accessory sinus disease, the presence of in- 
fected tonsils, and the history of rheumatic 
fever or childhood tuberculosis. 


Class D 


Blood pressure over 130 mm (systolic) 
after repeated examination, with normal car- 
diac measurements, was included in this class. 
Any history of chronic bronchitis was suffi- 
cient evidence for inclusion in this group. 
Uncorrected hernia were rated Class D. Eyes 
which presented less than 20/20 with correc- 
tion were included in this class. Congenital 
cataract, loss of one eye, corneal opacities, 
and progressive myopia were classed in this 
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group. The finding of marked septal devia- 
tion with loss of areation and with polypi 
present was sufficient for placement her 
Previous ear infections with subsequent 
mastoid infections were included. Repeated 
respiratory infections were also grouped in 
this class. 


Class E 


A difference between Class D and Class E 
lay in the blood pressure findings. Any blood 
pressure over 140 mm (systolic) was included 
in this group. Slight increase in heart meas- 
urements with the presence of organic heart 
murmurs was evidence for inclusion here. 
Evidence of arrested tuberculosis or asthma 
placed the patient in this class. The history 
of peptic ulcer, with or without physical find- 
ings, placed the patient in this group. Uncor- 
rected and scrotol herniae were placed here. 
Marked deafness in one or both ears was evi- 
dence for inclusion here. High grade defective 
visions were placed here. Mild renal diseases 
were placed in this group, as were definite 
changes in the B. M. R. findings. 


Class F 


This group carried all the high-grade de- 
fects found in the participants in the study. 
Any hyperpiesia over 150 mm was included 
here. Mild decompensating heart disease was 
placed here. Active tuberculosis was sufficient 
for inclusion here. Diabetes mellitus, cardio- 
vascular, renal disease, and severe nervous 
disorders were placed in this class. 


In a similar manner various functions and 
defects were listed which would place an in- 
dividual in the various classes according to 
the weekly physical examination. Classifica- 
tions of the twelve weekly physical examina- 
tions for each student reveal the following 
distribution: 


TABLE IV 


DISTRIBUTION OF STUDENTS ACCORDING TO 
WEEKLY PHYSICAL EXAMINATION RATING 


Number of 
Class Students 
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pS 3” OE | ee SEES 27 
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CRITERION FOR CLASSIFYING SUBJECTS ON 
THE Basis OF WEEKLY PHYSICAL 
EXAMINATIONS 

. Normal temperature the 
experiment 
. Normal throat throughout the experi- 
ment 
:. Normal blood pressure throughout the 
experiment 
. Normal pulse and response throughout 
the experiment 
. Maintenance of normal weight or in- 
crease in weight 
6. No illness of any sort 


throughout 


Class B 

. Normal temperature 
experiment 

. Normal throat throughout the experi- 
ment 

. Normal blood pressure throughout the 
experiment 

. Normal pulse and response 

. Maintenance of normal weight 

». History of one cold during the experi- 
ment, with recovery in 2 weeks; occa- 
sional headache 


throughout the 


Class C 

. Normal temperature 
experiment 

. Occasional simple naso-pharngitis with 
recovery within 1 week 

3. Normal blood pressure 

. Pulse normal or very slightly elevated 

. Maintenance of normal weight 

. History of 2 or more colds during the 
experiment with recovery in each in- 
stance within two weeks 

7. Vision disturbances 


Class D 
. Normal temperature during the experi- 
ment 
2. Normal blood pressure 
. Frequent attacks of naso-pharyngitis 
during experiment 
4. Pulse elevated above go consistently 
. Abnormal loss of weight or abnormal 
gain of weight 
. History of frequent colds with delayed 
recovery or other minor illnesses 


throughout the 
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Class E 

1. Variation in blood pressure with in- 
crease in both systolic and diastolic of 
10 points 

. Temperature elevated with explainable 
cause 

. Frequent attacks of naso-pharyngitis; 
chronic cough, or hoarseness 

. Abnormal loss or gain of weight 

. Pulse consistently elevated over 100 

. History of constant colds, accompanying 
sinusitis 

. Confinement in bed 
lasting over 10 days 

Class F 

1. Variation in blood pressure with in- 
crease in both systolic and diastolic 
above 150 

. Temperature consistently elevated 

. Frequent attacks of naso-pharyngitis 
with chronic cough; hemoptysis 

. Major illness during experiment; con- 
finement in bed for remainder of term 

. Pulse consistently over 100 

. Constant colds with severe sinusitis 

. Great loss or gain of weight 


with illness not 


STATISTICAL ANALYSIS OF DATA 


In order to test the potency of each of the 
several factors which may or may not be 
present in the kind of learning situation 
studied as measured by the five criteria of 
learning success, the simple correlation be- 
tween each of the twenty factors and each of 
the criteria was calculated. The resultant 
coefficients of correlation together with their 
probable errors and the number of paired 
observations on which each coefficient is 
based are listed in Table V. 


INTERPRETATION OF RESULTS 


Within each row of Table V that factor 
having the highest coefficient of correlation 
with the corresponding criterion has the 
highest predictive value of the factors studied 
when taken singly. The following observa- 
tions may be drawn from the data: 

1. When the consistency of the data is 
considered, we observe that three of the 
criteria—teacher’s marks, departmental test 
averages, and the Purdue Test Form B—are 
highly comparable measures. 
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2. When the percentage of gain in achieve- 
ment is used as the achievement criterion, 
we find considerable change in the size of 
the coefficient. This change is probably due 
to the fact that certain known factors in 
achievement such as_ intelligence, initial 
ability, high school grades, are partially con- 
trolled and place the students on a more 
nearly equal basis. 


3. The table of coefficients of correlation 
seems to indicate that the raw score difference 
between pre-test and final test is a poor meas- 

_ure of achievement. The fact that coefficients 
are almost wholly in reverse relationship is 
due to the terrific handicap placed on the 
brighter students who could not possibly 
make the gain that is possible for the poorer 
students to make. It is difficult to improve at 
a rapid rate when one starts very high upon 
the learning curve. 


4. The factors having the greatest pre- 
dictive value for achievement in Freshman 
English as measured by teacher’s marks, de- 
partmental test averages, and the Purdue 
Test Form B are initial ability in English, 
intelligence, reading comprehension and high 
school records, both general and English 
averages. We might say that these factors are 
similar and closely related if the consistency 
of the data is considered. 


5. Both measures of health show con- 
sistently high coefficients of correlation with 
all of the criteria. It is of particular sig- 
nificance that weekly health status is a very 
important factor in English achievement 
when measured by departmental test aver- 
ages, and of almost equal significance is the 
health factor measured by the comprehensive 
physical examination when correlated with 
percentage of gain in achievement. 


6. Using r= .50, either health measure 
alone give a 9 —-1—k = .13, which means 
that by using a prediction equation involving 
the health measure, we can predict achieve- 
ment 13% better than by chance selection. 
Such factors as intelligence, initial ability in 
English, high school average, and reading 
comprehension correlate with the criteria with 
an r of approximately .75 to give 9 —1—k 
== .34. Thus the health measures are about 
one-third as efficient in predicting achieve- 
ment as the traditional and commonly 
accepted “best” measures. 
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7. Relative to criterion D (percentage o/ 
gain in achievement), the health factor 
(physicial examination) is just as efficient 
(barring chronologcial age) as the best o{ 
traditional measures. 

8. The two health measures and chrono- 
logical age are the only three measures that 
are significantly associated with raw score 
gain. 

g. Intelligence, high school average in 
English, and general high school average 
seem to have a similar high predictive value 
when the criteria A, B, and C are used, and 
a slightly less significant value when D is 
used, but when E is used the predictive is 
negligible. 

10. Initial ability in English as measured 
by the Cooperative English Test has an 
extremely high predictive value when criteria 
A and B are used, and the highest of all the 
factors when criterion C is employed. It also 
correlates significantly with criterion D, but 
is of no significance when raw score gain is 
used. 


11. Reading comprehension has an impor- 
tant predictive value when criteria A, B, © 
are used as achievement measures; it has a 
positive value when percentage of gain is 
used, but is of no significance when raw score 
gain is used. 


12. Study Habits and reading rate have 
a positive predictive value except when raw 
score gain is used as the criterion, the most 
significant predictive relationship being when 
Purdue Form B is used as the measure of 
achievement. 


13. Average study time as reported by the 
students has no significance with any criterion 
used. 


14. Apparently, the older pupils tend to 
make larger raw score and percentage gains 
but receive lower final test scores, depart- 
mental test averages, and teacher’s marks 
than do the younger pupils. 

15. Leadership as measured by the Morris 
Trait L has a positive predictive value with 
all the criteria except raw score gain. 

16. Socio-economic status seems to have 
no predictive value for achievement in Eng- 
lish as measured by the five criteria probably 
because of the homogeneity of the group. 

17. Social adjustment seems to have no 
predictive value when correlated with the five 
criteria. 


TABLE V 
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18. With the exception of B2-S (Self- 
Sufficiency) and F2-S (Sociability), none of 
the Bernreuter traits show any significant 
relationship with the five criteria. 


The relative efficiency of the factors taken 
singly in predicting achievement in English 
as measured by the five criteria is shown in 
Table VI. The Purdue Test Form B, teacher's 
marks, and the departmental test averages are 
considered comparable measures of achieve- 
ment in English as shown by the consistency 
of the data, and the factors correlated with 
these data place themselves in the order in- 
dicated in Column I when average rank with 
the three comparable criteria is computed. 
When the percentage of gain is used as the 
criterion of achievement in English, the 
factors rank themselves in the order indicated 
in Column II; Column III indicated the rank 
of the factors when raw score gain is the 
criterion. 


The health relationship with high school 
English average, general high school average, 
and intelligence is sufficiently high to say 
that the correlation trend is positive in each 
case. 
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PARTIAL CORRELATIONS 


1. Achievement in English as measured by 
the Purdue Test Form B with health when 
intelligence is held constant — r= .25. 

2. Achievement as measured by the Purdu 
Test Form B with intelligence when health 
is held constant — r == .79. 

Thus we see, as the partial coefficients o/ 
correlation indicate, that intelligence as 
measured by the American Council Psycho- 
logical Test is a more important predictive 
factor than health, but it is also obvious that 
health is a positive factor. 

3. Achievement (as measured by the per- 
centage of gain between Purdue Test Form A 
and Purdue Test Form B) with health when 
intelligence is held constant — r = .50. 

4. Achievement (as measured by the per- 
centage of gain between Purdue Test Form A 
and Purdue Test Form B) with intelligenc: 
when health is held constant — r = .32. 

Again we see from the partial coefficients 
that health becomes even a more important 
factor than intelligence when the percentage 
of gain is used as the criterion for achieve- 
ment. 


TABLE VI 
Column I Column II Column III 
H. S. Eng. Ave. Chron. Age Chron. Age 


Initial Eng. A. 

. *Intelligence 

*Gen. H. S. Ave. 
Reading Compreh. 
Weekly Health S. 
Health (Phys. Ex.) 
Reading Rate 
Study 


H. E. Eng. 


Intelligence 


QO IM oT CoP 


abits 


Health (Phys. Ex.) 
Initial Eng. A. 


Gen. H. S. Ave. 
Reading Compreh. 


Weekly Health S. 
Reading Rate 


Weekly Health Status 
Health (Phys. Ex.) 
Ave. Ave. Study Time 
Leadership 
Socio-Economic S. 
Study Habits 

Gen. H. S. Ave. 

H. E. Eng. Ave. 


10. Leadership a Dominance-Submis. 
11. Self-Suffic. Study Habits Social Adjust. 

12. Sociability Self-Suffic. *Intelligence 

13. Introver.-Extro. Sociability *Confidence 

14. Socio-Econ. S. Socio-Econ. S. *Self-Suffic. 

15. Social Adjust. Introver.-Extro. Introver.-Extro. 
16. *Neurotic Tend. *Neurotic Tend. Reading Rate 

17. *Confidence *Social Adjust. Initial Eng. Abil. 
18. Dominance Ave. Study Time Reading Comprehen. 
19. Ave. Study Confidence Neurotic Tend. 

20. Chron. Age Dominance 


Sociability 


* A few intercorrelations were computed in order to determine more significant results. 


Measures 


Intelligence with Health (Phys. Exam.) 
Intelligence with Chronological Age 
Intelligence with High School Eng. Ave. _ 
Health (Phys. Exam.) with Chron. Age 
Health (Phys. Exam.) with H. S. Eng. Ave. 
Health (Phys. Exam.) with Gen. H. S. Ave. 
Gen. H. S. Ave. with H. S. Eng. Ave. 
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If we turn to the high school average in 
English as a measure of academic achieve- 
ment in English, we find the following 
significant partial coefficients of correlation: 

5. Health and high school English average 
with intelligence held constant — r == .3. 

6. High school English average and intel- 
ligence with health held constant — r= .9. 

Again we see that health is a positive 
factor in achievement as the size of its 
coefficient indicates a positive correlation 
trend. 


CONCLUSIONS 


The author is fully cognizant of the several 
limitations present in this study, such as 
small sample, differences in opinion as to the 
real meaning of the words health and achieve- 
ment, the health measures, and the limitation 
present when the correlation technique is 
used; but in spite of such limitations there 
seem to be several significant findings. 


1. Any attempt to measure academic 
achievement should come only after a careful 
study of the meaning of the word since there 
are so many possible criteria. Five such 
criteria were used in the present study, and 
there are three outstanding differences evi- 
denced when the consistency of the data is 
observed: status, percent of improvement, 
and raw score improvement. 

2. When academic achievement in English 
is considered from the “status” viewpoint, 
the factors of intelligence, initial ability in 
English, general high school average, high 
school English average, and reading compre- 
hension offer the best predictive indices, with 
initial ability in English as the best index. 

3. Since college freshmen usually represent 
a rather homogeneous group, social and 
economic factors in achievement, as measured 
by the tests employed in this study, are of 
little predictive value. 


4. The coefficients of correlation between 
the two health measures and the five criteria 
of achievement in English are consistently 
higher than the correlations reported in 
similar studies, probably because of the 
comprehensive health measures employed 
here. 

5. When an individual’s health is studied 
from the beginning to the end of a school 
term, his health status during that period is 
significantly correlated with his achievement. 
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6. If achievement is measured by depart- 
mental test averages alone, then the health 
of the individual at the time of the tests is 
one of the most important factors condition- 
ing his achievement. There is a_ possible 
neurological explanation for such a relation- 
ship. 

7. Health status (as measured by the 
comprehensive physical examination) and 
health function (as measured by the weekly 
health examination) are as consistently high 
in predicting achievement as any of the 
factors included in this study. 


8. If intelligence (as measured by a 
standard intelligence test), general high 
school average, high school English average, 
and reading comprehension are considered as 
a single measure, since they include so many 
common elements, then the second most 
important factor in predicting achievement 
in English is health. 


g. A pre-test, such as the Purdue Test or 
the Cooperative English Test, is of definite 
value in grouping students according to their 
probable achievement in freshmen English in 
order to eliminate the teaching waste usually 
associated with a wide spread of abilities. 


10. If achievement is to be measured in 
terms of individual improvement, then the 
percent of gain made in terms of possible 
gain is a reliable measure. On the other hand, 
the raw score difference between a pre-test 
and a final test is not a good measure of 
improvement because of the inconsistency of 
the gap occurring in the distribution curves. 


11. When the twenty measures employed 
in thé present study are considered in rank 
order as they affect achievement, the two 
health measures rank above all with the 
exception of the conventionally accepted 
measures of intelligence, high school average, 
and aptitude. 


12. The older students, in this study, tend 
to make poorer final grades on the course, 
possibly because of retardation in high school 
or delayed college entrance, but they tend 
to make greater improvement because of low 
initial ability and greater application. 


As a result of this study, the following 
recommendations are ventured: 


1. Because of the apparent health factor 
in scholastic achievement, colleges and uni- 
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versities should use their health service de- 
partments for educational purposes by making 
health ratings of individual students available 
to members of the faculty. 


2. In studies where a measure of health 
is to be employed, the comprehensive physical 
examination or a health case study should 
be used instead of the indices used so often 
in the past, such as height, weight, etc. 

3. In studies conditioning achievement, the 
measures should be grouped into categories 
because of the many common elements 
present in a group of isolated measures. 
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4. Medically trained individuals should 
attempt to discover methods whereby individ- 
uals can be rated objectively on comprehen- 
sive examinations. 


5. In order to determine more valid con- 
clusions, studies similar to this should be 
made by other investigators with different 
groups of students living and working under 
conditions different from those in this in- 
vestigation. With larger samples there would 
be a sufficiently high number of cases in each 
health category to study the characteristics 
associated with it. 
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If the correlation of two tests within the 
range of one grade is known while one of the 
standard deviations for the range of several 
grades is available, the correlation in the 
larger range is usually estimated through the 
following formula: 


oy? (I —7,”) = Sy?(1 — Rey)*) (1) 


where r, is the correlation obtained in the 
small range, R.,,, the correlation estimated 
for the large range and from the y— variable, 
a, the standard deviation in the small range, 
and X, the standard deviation in the large 
range. 

This formula was derived by Kelley’ 
assuming that the correlation in the small 
range is the result of the curtailment of the 
distribution of the x— variable in the scatter 
diagram for the large range, that such cur- 
tailment affects the y— variable only in a 
consequential manner, and that the y— ar- 
rays are homoscedastic and show rectilinear 
regression in the scatter diagram for the large 
range. Then the slope of the line through the 
means of the y— arrays is not changed by the 
curtailment and the regression coefficients for 
both ranges are equal. Hence, 


s 2 
Pm Fg Res (2) 


Dividing (2) by (1), 
Di ae Ray)’ 
o,°(1 — 7") a ?(1 — Rey,’) 





(3) 


This formula is useful in estimating R,,,, 
when &,, the standard deviation of the vari- 
able whose distribution is assumed to be cur- 
tailed, is available. It was originally derived 
by Pearson? in a different manner, and given 
later by Kelley* in the present form. 


1 Kelley, T. L., Statistical Method, p. 224. New York: The 
McMillan Co., 1923. 

2 Pearson, K., On the Influence of Natural Selection on the 
Variability and Correlation of Organs, Phi). Trans. Roy. Soc. 
of London, A, Vol. CC, p. 23, 1902. 

* Kelley, T. L., Op. Cit., p. 223. 


If the distribution of the y— variable is 
assumed to be curtailed while that of the x- 
variable is assumed to be rectilinear and 
homoscedastic, the equations corresponding 
to (1), (2) and (3) are: 

ox? (I —ro") = 3,°(1 — Recx)*) (1a) 
(2a) 


Rei." 
>y7(1 — R,.,,.)”) 


where the notation is the same as before, but 
in terms of the x— variable. 








oi—re) a 


Text books in educational statistics do not 
agree as to which of the four equations (1), 
(1a), (3) or (3a) is to be used in estimating 
the correlation in a wide range from that ob- 
tained in a narrow range. Garrett* only gives 
(1) and (1a) while Holzinger® omits these 
and recommends the use of (3) and (3a). 
The common practice is to use (1) when only 
>, is known and (1a) when only &, is avail- 
able. (3) and (3a) have not been so widely 
used due, perhaps, to the simplicity of the 
former two. But it is evident that unless 

N N 


co, Oy 


(1) and (1a) will yield different results, and 
that the same is true about (3) and (3a). 
Moreover, if the y— variable is not strictly 
rectilinear throughout both ranges, the values 
obtained from (1) and (3) will differ; and 
unless the x— variable is strictly rectilinear 
throughout both ranges, (1a) and (3a) will 
yield varying results. Therefore, in the case 
that both 3, and 3, are known and that the 
two foregoing conditions are not fulfilled by 
the distributions, there will be four different 


“Garrett, H. E., Statistics in Psychology and Education, 
p. 304, New York: Longmans Green and Co., 1937. 

* Holzinger, K , Statistical Methods for Students of 
Education, p. 172. New York: Ginn and Co., 1928 
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solutions to one and the same problem. In 
the absence of a method providing for a 
unique solution, the establishment of criteria 
for the selection of the variable whose dis- 
tribution is assumed to be rectilinear and 
homoscedastic is necessary. Furthermore, it 
is still possible that neither the x— distribu- 
tion nor the y— distribution is truly linear 
and homoscedastic in the sense that the value 
of R, shall lie within the error of random 
sampling. The purpose of this article is then, 
to propose a test of the assumptions of recti- 
‘linearity and homoscedasticity throughout 
both ranges that will enable one to choose the 
most suitable variable on which a fair esti- 
mate of R, may be based, or to conclude that 
none of the variables may provide a solution 
within the error of random sampling. 

Imagine a scatter diagram for a large range 
in which the distribution of the y— variable 
is strictly rectilinear and homoscedastic. Let 
Toy) be the correlation within the small 
range, and R.,,, be the correlation for the 
large range. Equation (1) may then assume 
the following form: 


S..? SZ 2 


v R 9 @y ° 
2 e(y) es I Tecy) 
Oy Oy 


But by (2), 


























rr y) 
/ >: 
V ann | 
ig 
For the sake of simplicity, let 
2 s.2 
—_ — P,?, and —— — P,? 
co,” a.” 
then, 
VP,? —1 
T ney 0 
; VP; vee 5 (4) 


If instead, the distribution of the x— vari- 
able is the one assumed to be rectilinear and 
homoscedastic, the corresponding equation is 

1. pe ene 


(4a) 


VP, —1 
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rey) is a function of the four standard 
deviations and it imposes the necessary con- 
ditions for the exact estimate of R.,,) under 
the assumptions of rectilinearity and homos- 
cedasticity in the distribution of the y— vari- 
able throughout both ranges. The same asser- 
tion holds for r.,x, in respect to the x 
variable. 


If the correlation in the large range is 
known, say the correlation of two tests within 
the range of several grades, the correlation for 
the range of one grade may be determined 
from equations (1) or (3). This would be a 
correction for heterogeneity of the population. 
Let y be the variable whose distribution is 
rectilinear and homoscedastic. Substituting 
the value of 7.,y) given by (4) for r, in (2) 
and changing there R..,, for Ro»), 

P,VP,? —1 (=) 
P, VP2—1 : 

When the estimate is based on the x— vari- 

able, the corresponding equation is 


P, VPZ—1 
P, VPi—1 

As in (4) and (4a), (5) and (5a) impose 
the necessary conditions for the correct calcu- 


lation of recy) and f.;x) respectively from 
R.y, and R.,x, under the stated assumptions. 


Rey) ==s 


Rox) = 





(5a) 


If the four standard deviations are avail- 
able, the values of r,,y) and 7.,x) may be com- 
pared to that of 7,, the correlation obtained 
for the small range. If ro = reyx), Rex) May 
be obtained through (1a). If 75 = recy), Revs 
may be estimated from (1). If ro% rex) and 
To #Yocy), the significance of the difference 
must be determined in each case. 


The significance of a difference may be 
deduced from the critical ratio, that is, from 
the ratio of the difference to the standard 
error of the difference. If d stands for 
locyy) —Yo, oa, the standard error of the 
difference is 





C4 =Vo;" oy) + or. —_ 27 ray) TOF rey) Fg 
If r.,y, and 7, are uncorrelated,® 


2 2 
o1= Voy, +O, 


* This assumption is made in order to obtain an approxi- 
mation to the value of oy. The correct value can only be 
obtained by taking this correlation into account and evaluating 
it. This, have been unable to do, but I think that the 
approximation offered is better than nothing; and I hope that 
some mathematician will take the time to derive the exact 
value of this standard error. 


March 


and tl 


Oo; 
0 


appro 


which 
arity, 
n is t 
To ot 


For ¢ 
below 


Ta 
log fr. 


and, 








March, 1939] 


and the critical ratio, CR, is, 


Sas 


la» 2 Se 
Vor ely 7 weg 


c\ 


(0) 


o,, may be easily calculated from the 

approximate formula, 

tome i” 
= (7) 
Vn 

which involves the assumptions of rectiline- 
arity, homoscedasticity, and mesokurtosis. 
n is the number of ‘cases in the small range. 
To obtain the CR, the value of FreyyiS needed. 


For convenience equation (4) is reproduced 
below. 


_ VE? 


ly) =——=—[—= 
VPZ—t 


(4) 


Taking logarithmic differentials, 


log recy) —= © log (P,? — 1) —~ log (P,2— 1) 


and, 


ir, Fah, FdP. 
Tey) {P,* — 1) (P,? —1) 





Squaring, summing, and dividing by the 
number of samples, 


° 


‘ , . 
Or ey) __ y : x oH, 2P. Py rp.py op, 
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Squaring, summing, and dividing by the 
number of samples, 


2 > 
ov 


Zy oy 


.. and oy, are uncorrelated, 


By well known formulas which assume 
mesokurtic distributions, 
= 
— (10) 
V2.4 
Ce 
— (10a) 
V2n 


= —— (rob) 
V2N 


0. 


V 2m 


(roc) 


where V — number of cases in the large range 
and m — number of cases in the small range. 
Substituting in (9) and (10) and (10a) 


(11) 





Fat (PP 1)? + (P21)? (P2—1) (Pe—1) 


But 
» * 
Rina 


Oy 
Taking logarithmic differentials, 
dP, dx, _ do, 
~~ «= Oy 





Ss 


~y 


(8) 
Likewise, 


op’, = Pf > + x) (11a) 


Now rp, py is the only unknown expression 
in equation (8). Its equivalent may be found 
as follows: 


-=— = log P, — log P. = log 3, — log a, +- log o,— log 
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oxdjue 
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Squaring, summing and dividing by the number of samples, 


opy i 27p.Py Op, TPy rz, Tg, 
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By another well known formula which in- 

volves the usual assumptions, 
*. ay R,” (13) 

and 
ne (13a) 

R, is the correlation that would be ob- 
tained in the large range. 

Equation (12) may be greatly simplified 
by substituting in it the values of the stand- 


l os Cy 











ard errors given by (10), (10a), (1ob) and 
(roc), the values of op, and op, obtained 








The right-hand member of (15) is approx- 
imately equal to 





This value is smaller than the right-hand 
member of (15), but the approximation is 
generally close enough for all practical pur- 
poses if V and m are large. Then, to this 
degree of approximation, 





( : — : (160) 
Py? —1 P.?—1 


Substituting in (6) this value and that 
given by (7) for o, . 


Vo — Vo 





CR=— = 








oreo 


through (11) and (11a), and those of r g, g, 
and ry, y,given by (13) and (13a). Doing 


so, while assuming that ry, ¢,, 7 3, «, 
and r » 


=x Oy 





ss 


23 Cx 


are all equal to pant 


—2rp, py Op 





= 2 
I 2 (1—,7,?)? 2 (17) 
P?—r1 + n 


When the correlation in the large range is 
known and that within the small range is to 
be estimated, if the y— variable is assumed 
to be rectilinear and homoscedastic, the stand- 
ard error of R.,,) is needed in order to obtain 


I I x." v.* 


kn a 2 a Se ee. Se ae ae ee ee ae 
FF '- ( Nn )+ aN on * on + 2N N n 


2’p.pyOp, Opy = Tx Py 


As R, is not known, R.,,, may be put in 
its place without appreciably affecting the 
final result. Substituting in (8) from (11), 
(11a) and (14), 











R.* ‘* 
se (14) 
N n 


an equation corresponding to (17) This de- 
rivation is not given here as it is too long and 
too similar to that just given for the standard 
error of r.,y). Making the same assumptions 
as before, this is, 
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If \ and mn are large, the following 
formula gives an approximate value for the 
standard error of Rey), 
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By (17), 
CR - .8000 — .5331 





/ 06877? + (1 — .80*)? 








(17a) 


ae! _ 
2N' 2n P?—1 P-, 


Equations (16a) and (17a) are equal to 
equations (16) and (17) respectively, except 
for the fact that the correlations in the large 
range have taken the place of the correlations 
in the small range. 

For a practical illustration, let Py = 1.200, 
P. = 1.600, r, = .8000, m =— 100, and 
N = 400. 

By (4), 

V 1.2007 — I 


L¢3) =— _ = -533) 
\V 1.600° — 1 





By (16), 


By formula (15) the value of o,,,. was 
found to be .1076 instead of .06877 as given 
by (16), and that of CR was found to be 
2.35 instead of 3.44. This will give an idea 
of the accuracy of formula (16). In either 
case the difference may be considered sig- 
nificant, as data yielding a critical ratio 
greater than 2.00, should not be accepted as 
fulfilling the conditions of rectilinearity and 
homoscedasticity for the purpose described 
here. 
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